Git development

Git development
 help / color / mirror / Atom feed

* Re: Git is not scalable with too many refs/*
From: Martin Fick @ 2011-09-30 21:02 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Rast, René Scharfe, Julian Phillips,
	Michael Haggerty
In-Reply-To: <201109301041.13848.mfick@codeaurora.org>

On Friday, September 30, 2011 10:41:13 am Martin Fick wrote:
> massive fix to bring it down to 7.5mins was awesome.
> 7-8mins sounded pretty good 2 weeks ago, especially when
> a checkout took 5+ mins!  but now that almost every
> other operation has been sped up, that is starting to
> feel a bit on the slow side still.  My spidey sense
> tells me something is still not quite right in the fetch
> path.

I guess I overlooked that there were 2 sides to this 
equation.  Even though I have been doing my fetches locally, 
I was using the file:// protocol and it appears that the 
remote was running git 1.7.6 which was in my path the whole 
time.  So eliminating that from my path and pointing to the 
the "best" binary with all the fixes for both remote and 
local, the full fetch does indeed speed up quite a bit, it 
goes from about 7.5mins down to ~5m!  Previously the remote 
seemed to primarily spend the extra time after:

 remote: Counting objects: 316961

yet before:

 remote: Compressing objects

> Here is some more data to backup my spidey sense: after
> all the improvements, a noop fetch of all the changes
> (noop meaning they are all already uptodate) takes
> around 3mins with a non gced (non packed refs) case. 
> That same noop only takes ~12s in the gced (packed ref
> case)!

I believe (it is hard to be go back and be sure) that this 
means that the timings above which gave me 3mins were 
because the remote was using git 1.7.6.  Now, with the good 
binary, in both repos (packed and unpacked), I get great 
warm cache times of about 11-13s for a noop fetch.  It is 
interesting to note that cold cache times are 20s for packed 
refs and 1m30s for unpacked refs.  I guess that makes some 
sense.  

But, this does leave me thinking that packed refs should 
become the default and that there should be a config option 
to disable it?  This still might help a fetch?

Since a full sync is now done to about 5mins, I broke down 
the output a bit.  It appears that the longest part (2:45m) 
is now the time spent scrolling though each change still.  
Each one of these takes about 2ms:
 * [new branch]      refs/changes/99/71199/1 -> 
refs/changes/99/71199/1

Seems fast, but at about 80K... So, are there any obvious N 
loops over the refs happening inside each of of the [new 
branch] iterations?

-Martin

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

^ permalink raw reply

* Re: [PATCH 2/2] use new Git::config_path() for aliasesfile
From: Junio C Hamano @ 2011-09-30 19:55 UTC (permalink / raw)
  To: Cord Seele; +Cc: Matthieu Moy, git, Eric Wong, Cord Seele, Jakub Narebski
In-Reply-To: <1317379945-9355-3-git-send-email-cowose@gmail.com>

I think the addition of "config --path" support is a good idea, but the
resulting code suffers from too many cut&paste cruft across the config*
family of methods.

How about doing a bit of refactoring, perhaps something like this, on top
as a separate patch?

I tried to be careful to still forcing the "one value only" for config_bool
and config_int, but extra sets of eyeballs would be needed.

 perl/Git.pm |   93 +++++++++++++++++-----------------------------------------
 1 files changed, 27 insertions(+), 66 deletions(-)

diff --git a/perl/Git.pm b/perl/Git.pm
index c279bfb..f0a6e92 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -563,22 +563,18 @@ sub wc_chdir {
 }
 
 
-=item config ( VARIABLE )
+=item _config_common ( VARIABLE )
 
-Retrieve the configuration C<VARIABLE> in the same manner as C<config>
-does. In scalar context requires the variable to be set only one time
-(exception is thrown otherwise), in array context returns allows the
-variable to be set multiple times and returns all the values.
-
-This currently wraps command('config') so it is not so fast.
+Common subroutine to implement bulk of what the config* family of methods
+do. This wraps command('config') so it is not so fast.
 
 =cut
 
-sub config {
-	my ($self, $var) = _maybe_self(@_);
-
+sub _config_common {
+	my ($self, $var, $opts) = _maybe_self(@_);
+	
 	try {
-		my @cmd = ('config');
+		my @cmd = ('config', $opts->{'kind'} ? @{$opts->{'kind'}} : ());
 		unshift @cmd, $self if $self;
 		if (wantarray) {
 			return command(@cmd, '--get-all', $var);
@@ -594,6 +590,21 @@ sub config {
 			throw $E;
 		}
 	};
+
+}
+
+=item config ( VARIABLE )
+
+Retrieve the configuration C<VARIABLE> in the same manner as C<config>
+does. In scalar context requires the variable to be set only one time
+(exception is thrown otherwise), in array context returns allows the
+variable to be set multiple times and returns all the values.
+
+=cut
+
+sub config {
+	my ($self, $var) = _maybe_self(@_);
+	return _config_common($self, $var, +{});
 }
 
 
@@ -603,60 +614,24 @@ Retrieve the bool configuration C<VARIABLE>. The return value
 is usable as a boolean in perl (and C<undef> if it's not defined,
 of course).
 
-This currently wraps command('config') so it is not so fast.
-
 =cut
 
 sub config_bool {
 	my ($self, $var) = _maybe_self(@_);
-
-	try {
-		my @cmd = ('config', '--bool', '--get', $var);
-		unshift @cmd, $self if $self;
-		my $val = command_oneline(@cmd);
-		return undef unless defined $val;
-		return $val eq 'true';
-	} catch Git::Error::Command with {
-		my $E = shift;
-		if ($E->value() == 1) {
-			# Key not found.
-			return undef;
-		} else {
-			throw $E;
-		}
-	};
+	my $val = scalar _config_common($self, $var, {'kind' => '--bool'});
+	return (defined $val && $val eq 'true');
 }
 
-
 =item config_path ( VARIABLE )
 
 Retrieve the path configuration C<VARIABLE>. The return value
 is an expanded path or C<undef> if it's not defined.
 
-This currently wraps command('config') so it is not so fast.
-
 =cut
 
 sub config_path {
 	my ($self, $var) = _maybe_self(@_);
-
-	try {
-		my @cmd = ('config', '--path');
-		unshift @cmd, $self if $self;
-		if (wantarray) {
-			return command(@cmd, '--get-all', $var);
-		} else {
-			return command_oneline(@cmd, '--get', $var);
-		}
-	} catch Git::Error::Command with {
-		my $E = shift;
-		if ($E->value() == 1) {
-			# Key not found.
-			return undef;
-		} else {
-			throw $E;
-		}
-	};
+	return _config_common($self, $var, +{'kind' => '--path'});
 }
 
 =item config_int ( VARIABLE )
@@ -667,26 +642,12 @@ or 'g' in the config file will cause the value to be multiplied
 by 1024, 1048576 (1024^2), or 1073741824 (1024^3) prior to output.
 It would return C<undef> if configuration variable is not defined,
 
-This currently wraps command('config') so it is not so fast.
-
 =cut
 
 sub config_int {
 	my ($self, $var) = _maybe_self(@_);
-
-	try {
-		my @cmd = ('config', '--int', '--get', $var);
-		unshift @cmd, $self if $self;
-		return command_oneline(@cmd);
-	} catch Git::Error::Command with {
-		my $E = shift;
-		if ($E->value() == 1) {
-			# Key not found.
-			return undef;
-		} else {
-			throw $E;
-		}
-	};
+	my $val = scalar _config_common($self, $var, +{'kind' => '--int'});
+	return $val;
 }
 
 =item get_colorbool ( NAME )

^ permalink raw reply related

* Re: [PATCH] contrib: add a pair of credential helpers for Mac OS X's keychain
From: Jay Soffian @ 2011-09-30 19:33 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Jeff King, git, Junio C Hamano
In-Reply-To: <CAEBDL5WhpVg17aPuRqrE5=2Q293kVD4fYtxGqRzx_K=87t-jgw@mail.gmail.com>

On Thu, Sep 29, 2011 at 6:03 AM, John Szakmeister <john@szakmeister.net> wrote:
>
> I've been working on a version of the keychain credential cache as
> well.  I did create a gui, although it's a bit painful.

I still don't understand why a CLI app should have a GUI credential prompt.

>>  2. The "get the username from the config" feature is triggered at the
>>     time of prompting the user (so instead of asking for the username,
>>     we check the config and pretend the user told us).
>>
>>     I did it this way originally so that helpers would have the first
>>     crack at setting a username, and we would fall back to the config.
>>     Thinking on it more, that may be backwards. If the user has told
>>     git "for github.com, I am user 'foo'", then that should probably
>>     take effect first, and --username=foo get passed to the helper.

Sorry, missed this part in my previous reply. I don't understand - how
do you ever send a username to the credential helper if you don't get
it from the config? But in any case, if you have a username (via
config or some other way), yes, I think it should be given to the
credential helper.

> I think that makes sense.  I think one thing we have to be careful
> about partial matches.  I wouldn't want the credential cache to send
> off the wrong password to a service.  This may be me being cautious,
> but if I don't have all the necessary bits, I'd rather we fail that to
> guess which entry is right.

The credential helper I wrote doesn't work that way. To do so would
mean using a rather more complicated form of the OS X Security API. It
asks for an entry using whatever fields it has, and OS X returns the
first match that satisfies. It's up to the user to yea/nay that match
if the credential helper isn't on the entry's ACL.

>>> +     /* "GitHub for Mac" compatibility */
>>> +     if (!strcmp(hostname, "github.com"))
>>> +             hostname = "github.com/mac";
>>
>> Nice touch. :)
>
> I honestly don't understand why this needs to be done.

Because GitHub for Mac stores its entries using "github.com/mac" as
the hostname.

> I don't use GitHub for Mac... does that mean this is busted for me?

No. It just means that the credential helper and GitHub for Mac store
their entry in a compatible fashion. (So that each can locate the
entry stored by the other.)

> [snip]
>> My series will also produce "cert:/path/to/certificate" when unlocking a
>> certificate. The other candidates for conversion are smtp-auth (for
>> send-email) and imap (for imap-send).  I guess for certs, you'd want to
>> use the "generic" keychain type.
>
> There is a method for adding a certificate to the keychain:
>   <http://developer.apple.com/library/mac/#documentation/Security/Reference/certifkeytrustservices/Reference/reference.html#//apple_ref/doc/uid/TP30000157>
>
> I'm not sure what that does exactly, but I do have a cert, and it
> shows up as "certificate" in the keychain.

That's for storing a certificate itself. In this case, I think we're
just talking about storing the passphrase which protects the
certificate's private key.

>> I wonder if some people would not want to cache cert passwords. Speaking
>> of which, I remember keychain asking me "do you want to let git see this
>> password?", but I don't ever remember it asking "do you want to save
>> this password?". Is that usually automatic? Again, I was kind of
>> expecting a dialog with a "remember this" checkbox.
>
> By the time you get Keychain involved, the decision has been made.
> Most applications offer that ability... and you're right, this should
> probably offer the same capability.  That also means stashing that
> data somewhere. :-(  OTOH, it does make for a better user experience.

What, no? If you don't want git to store usernames/passwords stored in
the OS X Keychain, don't use the git-osx-keychain credential helper.

j.

^ permalink raw reply

* Re: Git is not scalable with too many refs/*
From: Martin Fick @ 2011-09-30 19:26 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Rast, René Scharfe, Julian Phillips,
	Michael Haggerty
In-Reply-To: <201109301041.13848.mfick@codeaurora.org>

On Friday, September 30, 2011 10:41:13 am Martin Fick wrote:
> I dug into this a bit further.  I took a non gced and non
> packed refs repo and this time instead of gcing it to get
> packedrefs, I only ran the above git pack-refs --all so
> that objects did not get gced.  With this, the noop
> fetch was also only around 12s.  This confirmed that the
> non gced objects are not interfering with the noop
> fetch, the problem really is just the unpacked refs. 
> Just to confirm that the FS is not horribly slow, I did
> a "find .git/refs" and it only takes about .4s for about
> 80Kresults!

Is there a way I can for refs to always be packed?  I didn't 
see a config option for this.  I would like to try a fetch 
this way even if I have to make a small code tweak.  

I tried simulating on the fly ref packing every now and then 
by running the pack from another repo during the fetch, it 
actually slowed things down (by more than the time it took 
to do the packs).


-Martin

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

^ permalink raw reply

* Re: [PATCH] contrib: add a pair of credential helpers for Mac OS X's keychain
From: Jay Soffian @ 2011-09-30 19:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, John Szakmeister
In-Reply-To: <20110929075627.GB14022@sigill.intra.peff.net>

On Thu, Sep 29, 2011 at 3:56 AM, Jeff King <peff@peff.net> wrote:
> On Wed, Sep 14, 2011 at 10:51:53PM -0400, Jay Soffian wrote:
>
>> This credential helper adds, searches, and removes entries from
>> the Mac OS X keychain. The C version links against the Security
>> framework and is probably the best choice for daily use.
>>
>> A python version is also included primarily as a more readable
>> example and uses the /usr/bin/security CLI to access the keychain.
>>
>> Tested with 10.6.8.
>
> So I finally got a nice working OS X setup (10.7) to play around with
> these. Overall, works as advertised. :) I have a few comments, though.
>
>> Here's a C version that no longer links to git. I also kept the original
>> Python version as an example. I decided not to call out to
>> 'git credential-gitpass' as it was simple enough to manage /dev/tty
>> and there's no portability issues since this is OS X specific.
>
> This was my first one. I kind of expected there to be some kind of
> graphical password dialog. Especially because keychain will pop up a
> dialog and ask you "is it OK for git to access this password?". So I
> sort of assumed that people would assume that credentials happened
> outside of the regular terminal session (I see the same thing on Linux,
> for example, with gpg-agent, which will open a new window and grab
> focus).
>
> But I have no idea what's "normal" on OS X.

This makes no sense to me at all. Ignore OS X for the moment. You use
git on the command-line. Why would there be any expectation of it
interacting with the user via anything other than the terminal.

Anyway, I expect the username/password prompt on the command-line, in
the same terminal window where I just ran the git command that needs
credentials.

> I wondered if you were trying to be friendly to people who were
> connecting via ssh. But that doesn't seem to work at all. I couldn't get
> either version of your helper to actually do anything in an ssh session
> (even with the same user logged in on console).  I guess there is some
> magic to hook it into the keychain manager.

I don't understand where you're running ssh from/to in this scenario,
but OS X has a notion of security contexts (this is a bit of a
tangent):

http://developer.apple.com/library/mac/#technotes/tn2083/_index.html

> Also, regarding opening /dev/tty yourself versus using getpass. There
> are a few magic things getpass will do that your helper won't:
>
>  1. It respects core.askpass, GIT_ASKPASS, and SSH_ASKPASS if they are
>     set.
>
>  2. The "get the username from the config" feature is triggered at the
>     time of prompting the user (so instead of asking for the username,
>     we check the config and pretend the user told us).
>
>     I did it this way originally so that helpers would have the first
>     crack at setting a username, and we would fall back to the config.
>     Thinking on it more, that may be backwards. If the user has told
>     git "for github.com, I am user 'foo'", then that should probably
>     take effect first, and --username=foo get passed to the helper.
>
>     It doesn't make a big difference with long-term storage helpers,
>     because you tell them your username once and they remember it. But
>     for things like credential-cache, it lets you store the username
>     for a long time, but only cache the password (which means not
>     typing the username every time).
>
> So I think maybe reason (2) should go away. But (1) is definitely worth
> considering.

I found it ugly that git's native getpass doesn't echo the username
back, and it seems hackish to me for the credential helper to turn
back around and invoke it in any case. :-(

>> +       if (!unique)
>> +               die("Must specify --unique=TOKEN; try --help");
>
> My test harness checks that this case just asks for the password without
> bothering to do any lookup or storage. It probably doesn't really matter
> in practice; I think git should always be providing _some_ context.

Okay, that wasn't clear from whatever documentation I read on how
credential helpers should behave. But why invoke the credential helper
just to ask for a password?

>> +     hostname = strchr(unique, ':');
>> +     if (!hostname)
>> +             die("Invalid token `%s'", unique);
>> +     *hostname++ = '\0';
>
> Hrm. I was really hoping people wouldn't need to pick apart the "unique"
> token, and it could remain an opaque blob. If helpers are going to do
> this sort of parsing, then I'd just as soon have git break it down for
> them, and do something like:
>
>  git credential-osxkeychain \
>    --protocol=https \
>    --host=github.com \
>    --path=peff/git.git
>    --username=peff
>
> to just hand over as much information as possible, and let the helper
> throw it all together if it wants to.

Keychain entries have distinct fields. I broke apart the token and
stored it the way other applications mostly do on OS X.

>> +     /* "GitHub for Mac" compatibility */
>> +     if (!strcmp(hostname, "github.com"))
>> +             hostname = "github.com/mac";
>
> Nice touch. :)
>
>> +     if (!strcmp(unique, "https")) {
>> +             protocol = kSecProtocolTypeHTTPS;
>> +     } else if (!strcmp(unique, "http")) {
>> +             protocol = kSecProtocolTypeHTTP;
>> +     }
>> +     else
>> +             die("Unrecognized protocol `%s'", unique);
>
> My series will also produce "cert:/path/to/certificate" when unlocking a
> certificate. The other candidates for conversion are smtp-auth (for
> send-email) and imap (for imap-send).  I guess for certs, you'd want to
> use the "generic" keychain type.

Yep, I was punting on certificate for v1.

> I wonder if some people would not want to cache cert passwords. Speaking
> of which, I remember keychain asking me "do you want to let git see this
> password?", but I don't ever remember it asking "do you want to save
> this password?". Is that usually automatic? Again, I was kind of
> expecting a dialog with a "remember this" checkbox.

Each keychain entry has an ACL of applications that are allowed to
access it. When an application asks for an entry and the application
isn't on that entry's ACL, OS X (not the application) presents the
user the dialog you refer to. The application has no control over that
dialog.

Now, I could have the credential helper ask the user "store this
password?" after prompting for it, but why even use a credential
helper if you don't want it to store your credentials?

>> +def add_internet_password(protocol, hostname, username, password):
>> +    # We do this over a pipe so that we can provide the password more
>> +    # securely than as an argument which would show up in ps output.
>> +    # Unfortunately this is possibly less robust since the security man
>> +    # page does not document how to quote arguments. Emprically it seems
>> +    # that using the double-quote, escaping \ and " works properly.
>> +    username = username.replace('\\', '\\\\').replace('"', '\\"')
>> +    password = password.replace('\\', '\\\\').replace('"', '\\"')
>> +    command = ' '.join([
>> +        'add-internet-password', '-U',
>> +        '-r', protocol,
>> +        '-s', hostname,
>> +        '-a "%s"' % username,
>> +        '-w "%s"' % password,
>> +        '-j default',
>> +        '-l "%s (%s)"' % (hostname, username),
>> +    ]) + '\n'
>> +    args = ['/usr/bin/security', '-i']
>> +    p = Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
>> +    p.communicate(command)
>
> I noticed that when using the python helper, the dialog asking something
> like: "security wants to know this password. Allow it?"
>
> Which was kind of lame. I would hope we could convince it to say "git".
> But I didn't see any option in the "security" tool for specifying the
> context[1]. The C helper says "git-credential-osxkeychain". Which isn't
> the end of the world, but it would be prettier if it just said "git".

That's partly why I wrote the C version.

> [1] I can kind of see why they might not want you to set this for
> security reasons (because it makes impersonating other programs easy).
> On the other hand, saying "security" conveys absolutely nothing. And as
> far as I can tell, I could just call my program /tmp/iTunes, and it
> would say "iTunes wants to know this password...".

It is for security reasons. 99% of users will probably just click
"Okay" no matter what. For the 1% that bother to pay attention to the
dialog, it provides the full path to the binary. I'd be suspicious if
/tmp/iTunes wanted a password.

j.

^ permalink raw reply

* Re: [PATCH v3 1/4] compat/win32/sys/poll.c: upgrade from upstream
From: René Scharfe @ 2011-09-30 19:00 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, peff, j6t, gitster
In-Reply-To: <1317329963-6656-2-git-send-email-kusmabite@gmail.com>

Am 29.09.2011 22:59, schrieb Erik Faye-Lund:
> poll.c is updated from revision adc3a5b in
> git://git.savannah.gnu.org/gnulib.git
> 
> The changes are applied with --whitespace=fix to reduce noise.
> 
> poll.h is not upgraded, because the most recent version now
> contains template-stuff that breaks compilation for us.

> @@ -27,7 +27,10 @@
>  #include <malloc.h>
>  
>  #include <sys/types.h>
> -#include "poll.h"
> +
> +/* Specification.  */
> +#include <poll.h>
> +

In order to make bisecting easier, I think it's a good idea to squash in
the next patch that undoes this part, i.e. simply skip this hunk and add
a third non-verbatim note to the commit message.

René

^ permalink raw reply

* Re: [PATCH v3 3/4] enter_repo: do not modify input
From: René Scharfe @ 2011-09-30 19:00 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, peff, j6t, gitster
In-Reply-To: <1317329963-6656-4-git-send-email-kusmabite@gmail.com>

Am 29.09.2011 22:59, schrieb Erik Faye-Lund:
> diff --git a/path.c b/path.c
> index 6f3f5d5..f7dfd0b 100644
> --- a/path.c
> +++ b/path.c
> @@ -283,7 +283,7 @@ return_null:
>   * links.  User relative paths are also returned as they are given,
>   * except DWIM suffixing.
>   */
> -char *enter_repo(char *path, int strict)
> +const char *enter_repo(const char *path, int strict)
>  {
>  	static char used_path[PATH_MAX];
>  	static char validated_path[PATH_MAX];
> @@ -297,14 +297,15 @@ char *enter_repo(char *path, int strict)
>  		};
>  		int len = strlen(path);
>  		int i;
> -		while ((1 < len) && (path[len-1] == '/')) {
> -			path[len-1] = 0;
> +		while ((1 < len) && (path[len-1] == '/'))
>  			len--;
> -		}
> +
>  		if (PATH_MAX <= len)
>  			return NULL;
> -		if (path[0] == '~') {
> -			char *newpath = expand_user_path(path);
> +		strncpy(used_path, path, len);
> +
> +		if (used_path[0] == '~') {
> +			char *newpath = expand_user_path(used_path);
>  			if (!newpath || (PATH_MAX - 10 < strlen(newpath))) {
>  				free(newpath);
>  				return NULL;
> @@ -316,24 +317,21 @@ char *enter_repo(char *path, int strict)
>  			 * anyway.
>  			 */
>  			strcpy(used_path, newpath); free(newpath);
> -			strcpy(validated_path, path);
> -			path = used_path;
> +			strcpy(validated_path, used_path);
>  		}
>  		else if (PATH_MAX - 10 < len)
>  			return NULL;
> -		else {
> -			path = strcpy(used_path, path);
> -			strcpy(validated_path, path);
> -		}
> -		len = strlen(path);
> +		else
> +			strcpy(validated_path, used_path);
> +		len = strlen(used_path);
>  		for (i = 0; suffix[i]; i++) {
> -			strcpy(path + len, suffix[i]);
> -			if (!access(path, F_OK)) {
> +			strcpy(used_path + len, suffix[i]);
> +			if (!access(used_path, F_OK)) {
>  				strcat(validated_path, suffix[i]);
>  				break;
>  			}
>  		}
> -		if (!suffix[i] || chdir(path))
> +		if (!suffix[i] || chdir(used_path))
>  			return NULL;
>  		path = validated_path;
>  	}

The use of strcpy and strncpy makes me nervous, but I can't spot a bug
currently and strcpy and even strcat calls had been already in there
before your patch.

René

^ permalink raw reply

* Re: Updated tag 'junio-gpg-pub' ?
From: Andreas Schwab @ 2011-09-30 18:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Näwe, Git List
In-Reply-To: <7voby2oy2s.fsf@alter.siamese.dyndns.org>

You might want to update the tag message the next time with
s/git-/git /.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] Clarify that '--tags' fetches tags only
From: Junio C Hamano @ 2011-09-30 18:37 UTC (permalink / raw)
  To: Peter Shenkin; +Cc: git
In-Reply-To: <loom.20110930T041939-332@post.gmane.org>

Peter Shenkin <shenkin@gmail.com> writes:

> Perhaps it will be useful to say what would have been most
> helpful for me. In the current documentation for "fetch
> --tags", one sentence reads, "This flag lets all tags and
> their associated  objects be downloaded." The following small
> modification would, IMO, be sufficient: "This flag causes all
> tags and their associated objects (only) to be downloaded."

Hmm, from time to time we seem to see this kind of documentation
suggestion where:

 - We (try to) describe what xyzzy does by saying "This is what xyzzy
   does". We specifically do not say "In addition to what normally
   happens, xyzzy causes these additional things to happen."

 - The reader (somehow) assumes xyzzy does more than what we described in
   the documentation, even we did not say "In addition to..."; and then

 - A patch is proposed to add "these other things are _not_ done", after
   existing "This is what xyzzy does".

And it is not limited to the description of this particular option.

I think in general our documentation aims to spell out _all_ that happens,
and explicitly say "In addition to what normally happens", "This page
lists only the most common ways", etc., when such a clatification is
needed.

I am wondering if there is a systemic failure that gives an impression
that by default the documentation is incomplete and all other unspecified
thing also happens to the readers? If so are there things that we could
do better without going through individual description?

^ permalink raw reply

* Re: [PATCH/RFCv3 2/2] receive-pack: don't pass non-existent refs to post-{receive,update} hooks in push deletions
From: Junio C Hamano @ 2011-09-30 18:19 UTC (permalink / raw)
  To: Pang Yan Han
  Cc: git, Sitaram Chamarty, Shawn O. Pearce, Jeff King,
	Johannes Schindelin
In-Reply-To: <20110930132903.GA1622@myhost>

Pang Yan Han <pangyanhan@gmail.com> writes:

> Sorry for asking, but do I need to reroll this with the fixup in
> origin/ph/push-to-delete-nothing ? Is the commit message fine especially
> in light of the changes in the fixup?

If you think that the result of squashing the fix-up commit into your
patch looks OK, I do not think there is a need to reroll. The patch title
may need to be shortened, but other than that I do not see anything
glaringly wrong in the commit log message that I cannot amend out.

Thanks.

^ permalink raw reply

* Re: Git is not scalable with too many refs/*
From: René Scharfe @ 2011-09-30 18:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Martin Fick, Julian Phillips, Christian Couder, git,
	Christian Couder, Thomas Rast
In-Reply-To: <7vfwjeotv1.fsf@alter.siamese.dyndns.org>

Am 30.09.2011 18:52, schrieb Junio C Hamano:
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> 
>> Hi Martin,
>>
>> Am 29.09.2011 22:11, schrieb Martin Fick:
>>> Your patch works well for me.  It achieves about the same 
>>> gains as Julian's patch. Thanks!
>>
>> OK, and what happens if you apply the following patch on top of my first
>> one?  It avoids going through all the refs a second time during cleanup,
>> at the cost of going through the list of all known objects.  I wonder if
>> that's any faster in your case.
>> ...
>>  static void describe_one_orphan(struct strbuf *sb, struct commit *commit)
>> @@ -690,8 +689,7 @@ static void orphaned_commit_warning(struct commit *commit)
>>  	else
>>  		describe_detached_head(_("Previous HEAD position was"), commit);
>>  
>> -	clear_commit_marks(commit, -1);
>> -	for_each_ref(clear_commit_marks_from_one_ref, NULL);
>> +	clear_commit_marks_for_all(ALL_REV_FLAGS);
>>  }
> 
> The function already clears all the flag bits from commits near the tip of
> all the refs (i.e. whatever commit it traverses until it gets to the fork
> point), so it cannot be reused in other contexts where the caller
> 
>  - first marks commit objects with some flag bits for its own purpose,
>    unrelated to the "orphaned"-ness check;
>  - calls this function to issue a warning; and then
>  - use the flag it earlier set to do something useful.
> 
> which requires "cleaning after yourself, by clearing only the bits you
> used without disturbing other bits that you do not use" pattern.

Yes, clear_commit_marks_for_all is a bit brutal.  Callers could clear
specfic bits (e.g. SEEN|UNINTERESTING) instead of ALL_REV_FLAGS, though.

> It might be a better solution to not bother to clear the marks at all;
> would it break anything in this codepath?

Unfortunately, yes; the cleanup part was added by 5c08dc48 later, when
it become apparent that it's really needed.

However, since the patch only buys us a 5% speedup I'm not sure it's
worth it in its current form.

René

^ permalink raw reply

* Re: [RFC/PATCH]: reverse bisect v 2.0
From: Junio C Hamano @ 2011-09-30 18:13 UTC (permalink / raw)
  To: Michal Vyskocil
  Cc: git, Sverre Rabbelier, Johannes Sixt, Jeff King, Christian Couder
In-Reply-To: <20110930114220.GA742@zelva.suse.cz>

[administrivia: added people involved in the main discussion thread back
on CC line, and also added CCouder who seems to be fond of the command]

Michal Vyskocil <mvyskocil@suse.cz> writes:

> The bugfix command works like the previous git bisect start --reverse.

Does any released version of git have "bisect start --reverse"?

$ git bisect start --reverse
unrecognised option: '--reverse'

Let's suppose that we had a "git frotz" Porcelain subcommand, that used to
say "xyzzy" when all is well, but says "nitfol" these days. I released a
(bad) script that parses the output from the subcommand, and want to say
something like "This script only works with Git version after X", and to
find out the value of X, I need to bisect the history to find when "frotz"
started to say "nitfol".

I am not trying to find a commit that introduced a bug/regression to cause
the recent "frotz" to say "nitfol". Neither am I trying to find a fix that
corrected the earlier bogus output "xyzzy". No value judgement is involved
in this scenario.

I wonder if something along the following line would make the usage more
pleasant and self-explanatory:

    $ git bisect start --used-to='git frotz says xyzzy' v0.99 master
    Bisecting: 171 revisions left to test after this (roughly 8 steps)
    $ ... build and then test ...
    $ git bisect tested
    You are trying to check: git frotz says xyzzy
    Does the result satisify what you are trying to find [y/n]? yes

Saying 'yes' would be like saying 'good' and 'no' would be like saying
'bad' here.

When trying to find regression, you would say:

    $ git bisect start --used-to='it works' v0.99 master

and you say 'yes' if it works (equivalent to 'good'), and 'no' if it does
not (equivalent to 'bad').

When trying to find a fix, you would say:

    $ git bisect start --used-to='checkout $tree $path clobbers $path' v0.99 master

and you say 'yes' if it is still broken at the tested version (equivalent
to your reversed 'good'), and 'no' if the tested version contains a fix
(equivalent to your reversed 'bad').

I am not married to the name of the option, but --used-to='...' felt more
or less self-explanatory name to signal users what to describe in that
text. The condition "You are trying to check:" cue wants to remind the
user is "we used to do *this* but now we don't", and the command is trying
to see when it changed.

Thoughts?

^ permalink raw reply

* [PATCH] refs: Remove duplicates after sorting with qsort
From: Julian Phillips @ 2011-09-30 17:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Julian Phillips, Martin Fick, Christian Couder, git,
	Christian Couder, Thomas Rast, Michael Haggerty
In-Reply-To: <7vk48qouht.fsf@alter.siamese.dyndns.org>

The previous custom merge sort would drop duplicate entries as part of
the sort.  It would also die if the duplicate entries had different
sha1 values.  The standard library qsort doesn't do this, so we have
to do it manually afterwards.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---

On Fri, 30 Sep 2011 09:38:54 -0700, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
>
>> On 09/30/2011 01:48 AM, Junio C Hamano wrote:
>>> This version looks sane, although I have a suspicion that it may 
>>> have
>>> some interaction with what Michael may be working on.
>>
>> Indeed, I have almost equivalent changes in the giant patch series 
>> that
>> I am working on [1].
>
> Good; that was the primary thing I wanted to know.  I want to take
> Julian's patch early but if the approach and data structures were
> drastically different from what you are cooking, that would force
> unnecessary reroll on your part, which I wanted to avoid.
>
> Thanks.

I had a quick look at Michael's code, and it reminded me that I had missed one
thing out.  If we want to keep the duplicate detection & removal from the
original merge sort then this patch is needed on top of v3 of the binary search.

Though I never could figure out how duplicate refs were supposed to appear ... I
tested by editing packed-refs, but I assume that isn't "supported".

 refs.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index 4c01d79..cf080ee 100644
--- a/refs.c
+++ b/refs.c
@@ -77,7 +77,29 @@ static int ref_entry_cmp(const void *a, const void *b)
 
 static void sort_ref_array(struct ref_array *array)
 {
+	int i = 0, j = 1;
+
+	/* Nothing to sort unless there are at least two entries */
+	if (array->nr < 2)
+		return;
+
 	qsort(array->refs, array->nr, sizeof(*array->refs), ref_entry_cmp);
+
+	/* Remove any duplicates from the ref_array */
+	for (; j < array->nr; j++) {
+		struct ref_entry *a = array->refs[i];
+		struct ref_entry *b = array->refs[j];
+		if (!strcmp(a->name, b->name)) {
+			if (hashcmp(a->sha1, b->sha1))
+				die("Duplicated ref, and SHA1s don't match: %s",
+				    a->name);
+			warning("Duplicated ref: %s", a->name);
+			continue;
+		}
+		i++;
+		array->refs[i] = array->refs[j];
+	}
+	array->nr = i + 1;
 }
 
 static struct ref_entry *search_ref_array(struct ref_array *array, const char *name)
-- 
1.7.6.1

^ permalink raw reply related

* Re: [GUILT 1/6] Refuse to push corrupt patches
From: Josef 'Jeff' Sipek @ 2011-09-30 17:15 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: git
In-Reply-To: <1317219324-10319-1-git-send-email-alan.christopher.jenkins@googlemail.com>

Thanks for the patches.  I looked them over and they look good.  I'll give
them another thorough reading once I have a bit of time to apply them.

Jeff.

On Wed, Sep 28, 2011 at 03:15:19PM +0100, Alan Jenkins wrote:
> "guilt push" would treat corrupt patches as empty,
> because "git apply --numstat" prints nothing on stdout.
> 
> (You do get an error message on stderr,
>  but then guilt says "Patch applied" etc,
>  and I didn't notice the earlier error message
>  for quite some time.)
> 
> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@googlemail.com>
> ---
>  guilt |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/guilt b/guilt
> index d1e17d4..51532f9 100755
> --- a/guilt
> +++ b/guilt
> @@ -611,7 +611,7 @@ push_patch()
>  		cd_to_toplevel
>  
>  		# apply the patch if and only if there is something to apply
> -		if [ `git apply --numstat "$p" | wc -l` -gt 0 ]; then
> +		if [ `do_get_patch "$p" | wc -l` -gt 0 ]; then
>  			if [ "$bail_action" = abort ]; then
>  				reject=""
>  			fi
> -- 
> 1.7.4.1
> 

-- 
Hegh QaQ law'
quvHa'ghach QaQ puS

^ permalink raw reply

* Re: Git is not scalable with too many refs/*
From: Junio C Hamano @ 2011-09-30 16:52 UTC (permalink / raw)
  To: René Scharfe
  Cc: Martin Fick, Julian Phillips, Christian Couder, git,
	Christian Couder, Thomas Rast, Junio C Hamano
In-Reply-To: <4E8587E8.9070606@lsrfire.ath.cx>

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Hi Martin,
>
> Am 29.09.2011 22:11, schrieb Martin Fick:
>> Your patch works well for me.  It achieves about the same 
>> gains as Julian's patch. Thanks!
>
> OK, and what happens if you apply the following patch on top of my first
> one?  It avoids going through all the refs a second time during cleanup,
> at the cost of going through the list of all known objects.  I wonder if
> that's any faster in your case.
> ...
>  static void describe_one_orphan(struct strbuf *sb, struct commit *commit)
> @@ -690,8 +689,7 @@ static void orphaned_commit_warning(struct commit *commit)
>  	else
>  		describe_detached_head(_("Previous HEAD position was"), commit);
>  
> -	clear_commit_marks(commit, -1);
> -	for_each_ref(clear_commit_marks_from_one_ref, NULL);
> +	clear_commit_marks_for_all(ALL_REV_FLAGS);
>  }

The function already clears all the flag bits from commits near the tip of
all the refs (i.e. whatever commit it traverses until it gets to the fork
point), so it cannot be reused in other contexts where the caller

 - first marks commit objects with some flag bits for its own purpose,
   unrelated to the "orphaned"-ness check;
 - calls this function to issue a warning; and then
 - use the flag it earlier set to do something useful.

which requires "cleaning after yourself, by clearing only the bits you
used without disturbing other bits that you do not use" pattern.

It might be a better solution to not bother to clear the marks at all;
would it break anything in this codepath?

^ permalink raw reply

* Re: Git is not scalable with too many refs/*
From: Martin Fick @ 2011-09-30 16:41 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Rast, René Scharfe, Julian Phillips,
	Michael Haggerty
In-Reply-To: <201109262056.04279.chriscool@tuxfamily.org>

> On Monday, September 26, 2011 12:56:04 pm Christian Couder 
wrote:
> After "git pack-refs --all" I get:

OK.   So many great improvements in ref scalability, thanks 
everyone!

It is getting so good, that I had to take a step back and 
re-evaluate what we consider good/bad.  On doing so, I can't 
help but think that fetches still need some improvement.

Fetches had the worst regression of all > 8days, so the 
massive fix to bring it down to 7.5mins was awesome.  
7-8mins sounded pretty good 2 weeks ago, especially when a 
checkout took 5+ mins!  but now that almost every other 
operation has been sped up, that is starting to feel a bit 
on the slow side still.  My spidey sense tells me something 
is still not quite right in the fetch path.

Here is some more data to backup my spidey sense: after all 
the improvements, a noop fetch of all the changes (noop 
meaning they are all already uptodate) takes around 
3mins with a non gced (non packed refs) case.  That same 
noop only takes ~12s in the gced (packed ref case)!

I dug into this a bit further.  I took a non gced and non 
packed refs repo and this time instead of gcing it to get 
packedrefs, I only ran the above git pack-refs --all so that
objects did not get gced.  With this, the noop fetch was 
also only around 12s.  This confirmed that the non gced 
objects are not interfering with the noop fetch, the problem 
really is just the unpacked refs.  Just to confirm that the 
FS is not horribly slow, I did a "find .git/refs" and it 
only takes about .4s for about 80Kresults!

So, while I understand that a full fetch will actually have 
to transfer quite a bit of data, the noop fetch seems like 
it is still suffering in the non gced (non packed ref case).  
If that time were improved, I suspect that the full fetch 
will improve at least by an equivalent amount, if not more.

Any thoughts?

-Martin

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

^ permalink raw reply

* Re: [PATCH v3] refs: Use binary search to lookup refs faster
From: Junio C Hamano @ 2011-09-30 16:38 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Julian Phillips, Martin Fick, Christian Couder, git,
	Christian Couder, Thomas Rast
In-Reply-To: <4E85E07C.5070402@alum.mit.edu>

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 09/30/2011 01:48 AM, Junio C Hamano wrote:
>> This version looks sane, although I have a suspicion that it may have
>> some interaction with what Michael may be working on.
>
> Indeed, I have almost equivalent changes in the giant patch series that
> I am working on [1].

Good; that was the primary thing I wanted to know.  I want to take
Julian's patch early but if the approach and data structures were
drastically different from what you are cooking, that would force
unnecessary reroll on your part, which I wanted to avoid.

Thanks.

^ permalink raw reply

* Re: Git is not scalable with too many refs/*
From: Martin Fick @ 2011-09-30 16:09 UTC (permalink / raw)
  To: René Scharfe
  Cc: Julian Phillips, Christian Couder, git, Christian Couder,
	Thomas Rast, Junio C Hamano
In-Reply-To: <4E8587E8.9070606@lsrfire.ath.cx>

On Friday, September 30, 2011 03:12:08 am René Scharfe 
wrote:
> OK, and what happens if you apply the following patch on
> top of my first one?  It avoids going through all the
> refs a second time during cleanup, at the cost of going
> through the list of all known objects.  I wonder if
> that's any faster in your case.


This patch helps a bit more.  It seems to shave about 
another .5s off in packed and non packed case w or w/o 
binary search.

-Martin



> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 84e0cdc..a4b1003 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -596,15 +596,14 @@ static int
> add_pending_uninteresting_ref(const char *refname,
> return 0;
>  }
> 
> -static int clear_commit_marks_from_one_ref(const char
> *refname, -				      const unsigned 
char *sha1,
> -				      int flags,
> -				      void *cb_data)
> +static void clear_commit_marks_for_all(unsigned int
> mark) {
> -	struct commit *commit =
> lookup_commit_reference_gently(sha1, 1); -	if (commit)
> -		clear_commit_marks(commit, -1);
> -	return 0;
> +	unsigned int i, max = get_max_object_index();
> +	for (i = 0; i < max; i++) {
> +		struct object *object = 
get_indexed_object(i);
> +		if (object && object->type == OBJ_COMMIT)
> +			object->flags &= ~mark;
> +	}
>  }
> 
>  static void describe_one_orphan(struct strbuf *sb,
> struct commit *commit) @@ -690,8 +689,7 @@ static void
> orphaned_commit_warning(struct commit *commit) else
>  		describe_detached_head(_("Previous HEAD 
position
> was"), commit);
> 
> -	clear_commit_marks(commit, -1);
> -	for_each_ref(clear_commit_marks_from_one_ref, NULL);
> +	clear_commit_marks_for_all(ALL_REV_FLAGS);
>  }
> 
>  static int switch_branches(struct checkout_opts *opts,
> struct branch_info *new) --
> To unsubscribe from this list: send the line "unsubscribe
> git" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at 
> http://vger.kernel.org/majordomo-info.html

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

^ permalink raw reply

* Re: [PATCH v3] refs: Use binary search to lookup refs faster
From: Martin Fick @ 2011-09-30 15:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Julian Phillips, Christian Couder, git, Christian Couder,
	Thomas Rast
In-Reply-To: <7vwrcqpuc7.fsf@alter.siamese.dyndns.org>

On Thursday, September 29, 2011 09:44:40 pm Junio C Hamano 
wrote:
> Martin Fick <mfick@codeaurora.org> writes:
> > This works for me, however unfortunately, I cannot find
> > any scenarios where it improves anything over the
> > previous fix by René.  :(
> 
> Nevertheless, I would appreciate it if you can try this
> _without_ René's patch. This attempts to make
> resolve_ref() cheap for _any_ caller. René's patch
> avoids calling it in one specific callchain.
> 
> They address different issues. René's patch is probably
> an independently good change (I haven't thought about
> the interactions with the topics in flight and its
> implications on the future direction), but would not
> help other/new callers that make many calls to
> resolve_ref().

Agreed.  Here is what I am seeing without René's patch.

Checkout in NON packed ref repo takes about 20s, with patch 
v3 of binary search, it takes about 11s (1s slower than 
René's patch).

Checkout in packed ref repo takes about 5:30min, with patch 
v3 of binary search, it takes about 10s (also 1s slower than 
René's patch).

I'd say that's not bad, it seems like the 1s difference is 
doing the search 60K+times (my tests don't quite scan the 
full list), so the search seems to scale well with patch v3.

-Martin

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

^ permalink raw reply

* how to produce an index with smudged entries
From: Christian Halstrick @ 2011-09-30 15:33 UTC (permalink / raw)
  To: git, Robin Rosenberg

I am trying to find out how native git handles the racy git problem. I
read https://raw.github.com/git/git/master/Documentation/technical/racy-git.txt.
But I cannot reproduce the behaviour described in this text.

I want to create an index which contains entries where the length of
the file was set to 0 to mark the entry as "racily clean". As I
understood the text you have to modify a file and execute 'git
update-index <file>' so fast that in the end the files modification
time is the same as the modification time of the file .git/index. That
should be sufficient that when persisting the index during the 'git
update-index' call git should detect that the files mod-time is
younger than the index mod time  and should mark this entry "racily
clean" with a length of 0.

Here is the script with which I try to produce it but it always fails
to create an index with a smudged entry. Any ideas? (Btw: I would
assume that the first entry in the index is smduged when I find 4
zeros beginning at position 48. That's what I deduced from
https://github.com/gitster/git/blob/master/Documentation/technical/index-format.txt)

#!/bin/bash

git init
echo "initial" > foo

# wait for the next tick on the filesystem clock
touch foo
lastTouch=$(stat -c %Y foo)
while [ $lastTouch = $(stat -c %Y foo) ] ;do
  touch foo
done

git add foo

# modify 'foo' and update the index
# (hopefully before the filesystem clock has increased)
# repeat this step until the filesystem timer increased and
# print the index everytime
lastTouch=$(stat -c %Y foo)
while [ $lastTouch = $(stat -c %Y foo) ] ;do
  echo "modification #$n" > foo
  git update-index foo
  let "n += 1"
  echo "after attempt $n:"
  # dump 4 bytes of index beginning from pos 48.
  # That should be the length of the first entry
  od -j 48 -N 4 -x -A d .git/index
done

^ permalink raw reply

* Bug?: 'git log --find-copies' doesn't match 'git log --follow <rev> -- path/to/file'
From: Alexander Pepper @ 2011-09-30 15:32 UTC (permalink / raw)
  To: git

Hello Again.

I'm not really sure, if this is a bug or if I am missing something, but the following is quite annoying:

$ git version
git version 1.7.6.3
$ git clone https://github.com/voldemort/voldemort.git
$ cd voldemort
$ git log --numstat --find-copies dd4e90f9
...
3       15      contrib/ec2-testing/src/java/voldemort/utils/{StopClusterException.java => ClusterOperation.java}
$ git log --numstat dd4e90f9 -- contrib/ec2-testing/src/java/voldemort/utils/ClusterOperation.java
...
23      0       contrib/ec2-testing/src/java/voldemort/utils/ClusterOperation.java
$ git log --numstat --follow dd4e90f9 -- contrib/ec2-testing/src/java/voldemort/utils/ClusterOperation.java
...
6       10      src/java/voldemort/annotations/concurrency/Immutable.java => contrib/ec2-testing/src/java/voldemort/utils/ClusterOperation.java

So git log with copy and rename detection on (--find-copies) tells me, that the file StopClusterException.java is copied to ClusterOperation.java. But If I ask git log for that specific file with --follow git claims a copy from Immutable.java to ClusterOperation.java!

I understand, that git doesn't record renames and copies, but only detects it afterwords. But at least I would expect, that git detects the same thing consistently between to (quite) alike flags.

I also tried adding "--find-copies" and "--find-copies-harder" to 'git log --numstat --follow dd4e90f9', but they also result in claiming Immutable.java as the origin of the copy.

Is there a flag to get consistent results between the two or is this really a bug?

Greetings from Berlin
Alex

^ permalink raw reply

* Re: [PATCH v3] refs: Use binary search to lookup refs faster
From: Michael Haggerty @ 2011-09-30 15:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Julian Phillips, Martin Fick, Christian Couder, git,
	Christian Couder, Thomas Rast
In-Reply-To: <7v62karjv3.fsf@alter.siamese.dyndns.org>

On 09/30/2011 01:48 AM, Junio C Hamano wrote:
> This version looks sane, although I have a suspicion that it may have
> some interaction with what Michael may be working on.

Indeed, I have almost equivalent changes in the giant patch series that
I am working on [1].  The branch is very experimental.  The tip
currently passes all the tests, but it has a known performance
regression in connection if "git fetch" is used to fetch many commits.

But before comparing ref-related optimizations, we have an *urgent* need
for a decent performance test suite.  There are many slightly different
scenarios that have very different performance characteristics, and we
have to be sure that we are optimizing for the whole palette of
many-reference use cases.  So I made an attempt at a kludgey but
somewhat flexible performance-testing script [2].  I don't know whether
something like this should be integrated into the git project, and if so
where; suggestions are welcome.

To run the tests, from the root of the git source tree:

    make # make sure git is up-to-date
    t/make-refperf-repo --help
    t/make-refperf-repo [OPTIONS]
    t/refperf
    cat refperf.times # See the results

The default repo has 5k commits in a linear series with one reference on
each commit.  (These numbers can both be adjusted.)

The reference namespace can be laid out a few ways:

* Many references in a single "directory" vs. sharded over many
"directories"

* In lexicographic order by commit, in reverse order, or "shuffled".

By default, the repo is written to "refperf-repo".

The time it takes to create the test repository is itself also an
interesting benchmark.  For example, on the maint branch it is terribly
slow unless it is passed either the --pack-refs-interval=N (with N, say
100) or --no-replace-object option.  I also noticed that if it is run like

    t/make-refperf-repo --refs=5000 --commits=5000 \
            --pack-refs-interval=100

(one ref per commit), git-pack-refs becomes precipitously and
dramatically slower after the 2000th commit.

I haven't had time yet for systematic benchmarks of other git versions.

See the refperf script to see what sorts of benchmarks that I have built
into it so far.  The refperf test is non-destructive; it always copies
from "refperf-repo" to "refperf-repo-copy" and does its tests in the
copy; therefore a test repo can be reused.  The timing data are written
to "refperf.times" and other output to "refperf.log".

Here are my refperf results for the "maint" branch on my notebook with
the default "make-refperf-repo" arguments (times in seconds):

3.36 git branch (cold)
0.01 git branch (warm)
0.04 git for-each-ref
3.08 git checkout (cold)
0.01 git checkout (warm)
0.00 git checkout --orphan (warm)
0.15 git checkout from detached orphan
0.12 git pack-refs
1.17 git branch (cold)
0.00 git branch (warm)
0.17 git for-each-ref
0.95 git checkout (cold)
0.00 git checkout (warm)
0.00 git checkout --orphan (warm)
0.21 git checkout from detached orphan
0.18 git branch -a --contains
7.67 git clone
0.06 git fetch (nothing)
0.01 git pack-refs
0.05 git fetch (nothing, packed)
0.10 git clone of a ref-packed repo
0.63 git fetch (everything)

Probably we should test with even more references than this, but this
test already shows that some commands are quite sluggish.

There are some more things that could be added, like:

* Branches vs. annotated tags

* References on the tips of branches in a more typical "branchy" repository.

* git describe --all

* git log --decorate

* git gc

* git filter-branch
  (This has very different performance characteristics because it is a
script that invokes git many times.)

I suggest that we try to do systematic benchmarking of any changes that
we claim are performance optimizations and share before/after results in
the cover letter for the patch series.

Michael

[1] branch hierarchical-refs at git://github.com/mhagger/git.git
[2] branch refperf at git://github.com/mhagger/git.git

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: Updated tag 'junio-gpg-pub' ?
From: Junio C Hamano @ 2011-09-30 15:21 UTC (permalink / raw)
  To: Stefan Näwe; +Cc: Junio C. Hamano, Git List
In-Reply-To: <4E856676.3050209@atlas-elektronik.com>

Stefan Näwe <stefan.naewe@atlas-elektronik.com> writes:

> Junio,
>
> I haven't seen any announcement of this:
>
>> Fetching origin
>> >From http://github.com/gitster/git
>> - [tag update]      junio-gpg-pub -> junio-gpg-pub
>
> Did you update your GPG key ?

I removed a user@my-old-isp.com from the list of uids contained within the
public key, and I updated that tag to point at a blob that records a new
export of the result.  The "key" itself is not updated (it uses the same
crypto material).  People who got my public key from the old tag should
still be able to run "verify-tag" and get the right result.

Thanks for noticing.

^ permalink raw reply

* Re: [PATCH/RFC] add lame win32 credential-helper
From: Erik Faye-Lund @ 2011-09-30 14:42 UTC (permalink / raw)
  To: git; +Cc: jaysoffian, peff, gitster
In-Reply-To: <1316118324-6164-1-git-send-email-kusmabite@gmail.com>

On Thu, Sep 15, 2011 at 10:25 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
> ---
>
> I got curious what a credential-helper that uses Windows'
> Credential Manager would look like; this is the result.
>
> Some parts of the code is heavily inspired by Jay Soffian's
> OSX-keychain work.
>
> Not that it's useful yet, since the core-git code for the
> credential-helper support doesn't compile on Windows. So
> it's not fully tested, I've only read the interface
> documentation and experimented with it from the command
> line.
>

...aaand with the following patch applied on top, it pass the
credential-helper test that peff wrote:

---8<---
Subject: [PATCH] fix helper to pass test

---
 .../credential-wincred/git-credential-wincred.c    |   87 +++++++++++---------
 1 files changed, 47 insertions(+), 40 deletions(-)

diff --git a/contrib/credential-wincred/git-credential-wincred.c
b/contrib/credential-wincred/git-credential-wincred.c
index 072c002..278dd80 100644
--- a/contrib/credential-wincred/git-credential-wincred.c
+++ b/contrib/credential-wincred/git-credential-wincred.c
@@ -84,31 +84,35 @@ static void emit_user_pass(WCHAR *username, WCHAR *password)
 		wprintf(L"password=%s\n", password);
 }

-static int find_credentials(WCHAR *target, WCHAR *username)
+static WCHAR *get_target(WCHAR *unique, WCHAR *username)
 {
+	static WCHAR target_buf[4096];
+	_snwprintf(target_buf, sizeof(target_buf), L"gitcred:%s:%s",
+	    unique, username);
+	return target_buf;
+}
+
+static int find_credentials(WCHAR *unique, WCHAR *username)
+{
+	int i;
 	WCHAR user_buf[256], pass_buf[256];
 	DWORD user_buf_size = sizeof(user_buf) - 1,
 	      pass_buf_size = sizeof(pass_buf) - 1;
 	CREDENTIALW **creds, *cred = NULL;
 	DWORD num_creds;

-	if (!CredEnumerateW(target, 0, &num_creds, &creds))
+	if (!CredEnumerateW(get_target(unique, username ? username : L"*"),
+	    0, &num_creds, &creds))
 		return -1;

-	if (!username) {
-		/* no username was specified, just pick the first one */
-		cred = creds[0];
-	} else {
-		/* search for the first credential that matches username */
-		int i;
-		for (i = 0; i < num_creds; ++i)
-			if (!wcscmp(username, creds[i]->UserName)) {
-				cred = creds[i];
-				break;
-			}
-		if (!cred)
-			return -1;
-	}
+	/* search for the first credential that matches username */
+	for (i = 0; i < num_creds; ++i)
+		if (!username || !wcscmp(username, creds[i]->UserName)) {
+			cred = creds[i];
+			break;
+		}
+	if (!cred)
+		return -1;

 	if (!CredUnPackAuthenticationBufferW(0, cred->CredentialBlob,
 	    cred->CredentialBlobSize, user_buf, &user_buf_size, NULL, NULL,
@@ -126,9 +130,9 @@ static int find_credentials(WCHAR *target, WCHAR *username)
 }

 /* also saves the credentials if the user tells it to */
-static int ask_credentials(WCHAR *target, WCHAR *comment, WCHAR *username)
+static int ask_credentials(WCHAR *unique, WCHAR *comment, WCHAR *username)
 {
-	BOOL save = FALSE;
+	BOOL save = unique != NULL;
 	LPVOID auth_buf = NULL;
 	ULONG auth_buf_size = 0;
 	WCHAR user_buf[256], pass_buf[256];
@@ -140,7 +144,7 @@ static int ask_credentials(WCHAR *target, WCHAR
*comment, WCHAR *username)
 	ULONG package = 0;
 	CREDUI_INFOW info = {
 		sizeof(info), NULL,
-		comment ? comment : target, L"Enter password", NULL
+		comment ? comment : unique, L"Enter password", NULL
 	};

 	if (username)
@@ -148,7 +152,7 @@ static int ask_credentials(WCHAR *target, WCHAR
*comment, WCHAR *username)
 		    in_buf, &in_buf_size);
 	err = CredUIPromptForWindowsCredentialsW(&info, 0, &package,
 	    in_buf, in_buf_size, &auth_buf, &auth_buf_size,
-	    &save, CREDUIWIN_GENERIC | CREDUIWIN_CHECKBOX);
+	    &save, CREDUIWIN_GENERIC | (unique ? CREDUIWIN_CHECKBOX : 0));
 	if (err == ERROR_CANCELLED)
 		return 0;
 	if (err != ERROR_SUCCESS)
@@ -169,7 +173,7 @@ static int ask_credentials(WCHAR *target, WCHAR
*comment, WCHAR *username)
 		CREDENTIALW cred;
 		cred.Flags = 0;
 		cred.Type = CRED_TYPE_GENERIC;
-		cred.TargetName = target;
+		cred.TargetName = get_target(unique, user_buf);
 		cred.Comment = comment;
 		cred.CredentialBlobSize = auth_buf_size;
 		cred.CredentialBlob = auth_buf;
@@ -184,17 +188,21 @@ static int ask_credentials(WCHAR *target, WCHAR
*comment, WCHAR *username)
 	return 0;
 }

-static void delete_credentials(WCHAR *target, WCHAR *username)
+static void delete_credentials(WCHAR *unique, WCHAR *username)
 {
-	WCHAR temp[4096];
+	int i;
+	CREDENTIALW **creds;
+	DWORD num_creds;

-	wcscpy(temp, target);
-	if (username) {
-		wcscat(temp, L"|");
-		wcscat(temp, username);
-	}
-	if (!CredDeleteW(target, CRED_TYPE_GENERIC, 0))
-		die("failed to delete credentials");
+	if (!CredEnumerateW(get_target(unique, username ? username : L"*"),
+	    0, &num_creds, &creds))
+		return;
+
+	for (i = 0; i < num_creds; ++i)
+		if (!CredDeleteW(get_target(unique, creds[i]->UserName),
+		    CRED_TYPE_GENERIC, 0))
+			die("failed to delete credentials");
+	CredFree(creds);
 }

 int main(int argc, char *argv[])
@@ -232,9 +240,6 @@ int main(int argc, char *argv[])
 			die("Unrecognized argument `%s'; try --help", arg);
 	}

-	if (!unique)
-		die("Must specify --unique=TOKEN; try --help");
-
 	/* load DLLs */
 	advapi = LoadLibrary("advapi32.dll");
 	credui = LoadLibrary("credui.dll");
@@ -256,16 +261,18 @@ int main(int argc, char *argv[])
 	CredDeleteW = (CredDeleteWT)GetProcAddress(advapi, "CredDeleteW");
 	if (!CredWriteW || !CredUnPackAuthenticationBufferW ||
 	    !CredUIPromptForWindowsCredentialsW || !CredEnumerateW ||
-	    !CredPackAuthenticationBufferW || !CredFree || !CredDeleteW)
+	    !CredFree || !CredPackAuthenticationBufferW || !CredDeleteW)
 		die("failed to load functions");

-	if (reject) {
-		delete_credentials(unique, username);
-		return 0;
-	}
+	if (unique) {
+		if (reject) {
+			delete_credentials(unique, username);
+			return 0;
+		}

-	if (!find_credentials(unique, username))
-		return 0;
+		if (!find_credentials(unique, username))
+			return 0;
+	}

 	if (!ask_credentials(unique, description, username))
 		return 0;
-- 
1.7.7.rc0.257.g9fefc
---8<---

^ permalink raw reply related

* Re: [PATCH/RFCv3 2/2] receive-pack: don't pass non-existent refs to post-{receive,update} hooks in push deletions
From: Pang Yan Han @ 2011-09-30 13:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Sitaram Chamarty, Shawn O. Pearce, Jeff King,
	Johannes Schindelin
In-Reply-To: <7vk48sxn5g.fsf@alter.siamese.dyndns.org>

On Wed, Sep 28, 2011 at 04:28:11PM -0700, Junio C Hamano wrote:
> Pang Yan Han <pangyanhan@gmail.com> writes:
> 
> > I am unable to reply this until much later, but are the tests in this patch ok?
> > It's the first time I'm writing test cases for git.
> 
> I'll squash in a bit more updates and later publish the result.
> Thanks.

Hi Junio,

Thank you for correcting my many amateur mistakes!

Sorry for asking, but do I need to reroll this with the fixup in
origin/ph/push-to-delete-nothing ? Is the commit message fine especially
in light of the changes in the fixup?

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox