Git development

Git development
 help / color / mirror / Atom feed

* [PATCH (topgit) 0/2] tg-patch: fix pagination
From: Kirill Smelkov @ 2009-01-06 15:16 UTC (permalink / raw)
  To: git; +Cc: Kirill Smelkov

Kirill Smelkov (2):
  Implement setup_pager just like in git
  tg-patch: fix pagination

 tg-patch.sh |    3 +++
 tg.sh       |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 0 deletions(-)

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Johnny Lee @ 2009-01-06 15:05 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: git
In-Reply-To: <slrngm6hoj.n4a.sitaramc@sitaramc.homelinux.net>

Copy that.

Thanks Sitaram.

We also plan to do it in this way, just a small wondering that it
looks a kind of workaround instead of a more graceful solution.

Regards,
Johnny

On Tue, Jan 6, 2009 at 7:57 PM, Sitaram Chamarty <sitaramc@gmail.com> wrote:
> On 2009-01-06, Jeff King <peff@peff.net> wrote:
>> If you are going to have multiple users sharing a repository, generally
>> they should be in the same group and the core.sharedrepository config
>> option should be set (see "git help config", or the "shared" option to
>> git-init).
>>
>> I've never used that personally, though. I have always just used POSIX
>> ACLs, with a default ACL on each directory giving access to everyone.
>> E.g. (off the top of my head):
>>
>>   for user in user1 user2 user3; do
>>     setfacl -R -m u:$user:rwX -m d:u:$user:rwX /path/to/repo
>>   done
>
> If you're not worried about the finer-grained access control
> that acl(5) gives you, just do what "git init
> --shared=group" does:
>
>    git config core.sharedrepository 1 # as mentioned above
>    chmod g+ws .git
>
> Now set the group to something (I use "gitpushers" ;-)
>
>    chgrp -R gitpushers .git
>
> amd make sure all your users are part of that group.
>
> Works fine for small teams...
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
we all have our crosses to bear

^ permalink raw reply

* Re: BUG?? INSTALL MAKEFILE
From: Matthieu Moy @ 2009-01-06 14:56 UTC (permalink / raw)
  To: Lars Sadau; +Cc: git
In-Reply-To: <49635BF8.1010700@sadau-online.de>

Lars Sadau <lars@sadau-online.de> writes:

> Hallo,
>
> i'm a brand-new git user. Just one minute ago I wanted to install git in
> my home directory. The INSTALL file says type simply "make install", but
> the makefile does a global installation.

I was going to write:

  Either run ./configure --prefix=$HOME/wherever/you/want or edit the
  prefix variable in config.mak.

but then realized that prefix is set to $(HOME) by default. Are you
sure you didn't edit the Makefile or run any sort of ./configure
before "make install" ?

-- 
Matthieu

^ permalink raw reply

* Re: [PATCH/RFC 1/4] Optimised, faster, more effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-06 13:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds
In-Reply-To: <7v3afwizi7.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:
 <snipp>
> If everything is working correctly, I do not think you will ever have a
> situation where you check "A/B" here and get ENOTDIR back.  lstat("A/B")
> would yield ENOTDIR if "A" is not a directory, but:
>
>  (1) If you did test "A" in the earlier round in the loop, you would have
>      already found it is not a directory and would have exited the loop
>      with LSTAT_ERR.  You cannot test "A/B" in such a case;
>
>  (2) If you did not test "A" in the loop, that has to be because you had a
>      cached information for it, and the caching logic would have returned
>      early if "A" was a non-directory without entering the loop; in other
>      words, you would test "A/B" here without testing "A" in the loop only
>      when the cache says "A" was a directory.  You cannot get ENOTDIR in
>      such a case.
>
> In any case, my main puzzlement still stands.  I think ENOTDIR case should
> behave differently from ENOENT case.
>
> And I think it is an indication of a grave error, either this code is
> racing with an external process that is actively mucking with the work
> tree to make your cached information unusable, or somebody forgot to call
> clear_lstat_cache().
>
> Hmm?

  I have looked at this once more, and I have to admit that what you
  have said is correct.  Sorry for being confusing!  I will update the
  patch by removing the 'errno == ENOTDIR' part from the if-test.

  -- kjetil

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Matthieu Moy @ 2009-01-06 13:27 UTC (permalink / raw)
  To: Johnny Lee; +Cc: Junio C Hamano, git
In-Reply-To: <488807870901060118y2dbc7430ocea5cf9ce4bca3c7@mail.gmail.com>

"Johnny Lee" <johnnylee194@gmail.com> writes:

> Thanks for your consideration Junio,
>
> Peff has already helped me to figure out the root cause of this error
> is a possible bad practice on collaboration work.
>
> Here I attached the previous mail.

See core.sharedRepository in man git-config or
http://www.kernel.org/pub/software/scm/git/docs/git-config.html

-- 
Matthieu

^ permalink raw reply

* BUG?? INSTALL MAKEFILE
From: Lars Sadau @ 2009-01-06 13:26 UTC (permalink / raw)
  To: git

Hallo,

i'm a brand-new git user. Just one minute ago I wanted to install git in
my home directory. The INSTALL file says type simply "make install", but
the makefile does a global installation.

Lars

^ permalink raw reply

* Problems with large compressed binaries when converting from svn
From: Øyvind Harboe @ 2009-01-06 12:55 UTC (permalink / raw)
  To: git

I'm converting from svn and I've run into a
problem with tar.gz and tar.bz2 compressed files.

(This is a separate but only slightly related to previous post).

In subversion we committed large tar.bz2/gz files. These files would
change relatively rarely, but only very slightly.  The trouble with the tar.bz2
format is that if the first byte changes, then the rest of the file will also
be different. .zip does not have this problem, but .zip isn't a very friendly
format for our purposes.

Later on the tar.bz2/gz files started to change fairly often, but harddrives
get bigger much more quickly than the .svn repository grows so we just
kept doing things the same way rather than reeducate and reengineer
the procedures.

With .git we need to handle this differently somehow.

Does git have some capability to store diffs of compressed files efficiently?

The only other alternative I can think of is to commit uncompressed
.tar files which is a bit of a bump in the road, but I suppose could be
made to work.

-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer

^ permalink raw reply

* Re: Migration problems from SVN
From: Øyvind Harboe @ 2009-01-06 12:44 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git
In-Reply-To: <200901061339.49843.trast@student.ethz.ch>

On Tue, Jan 6, 2009 at 1:39 PM, Thomas Rast <trast@student.ethz.ch> wrote:
> Øyvind Harboe wrote:
>> If I early on and accidentally commit a *large* binary object, how do I get rid
>> of it from .git again?
>
> git-filter-branch can do that.  See the top of the EXAMPLES section in

Ah! Thanks! I will study this. Thanks for your patience with a beginner, I guess
I should have spotted this one. There is a lot to read and git is refreshingly
different and there is a lot of things to catch up on.




-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer

^ permalink raw reply

* Re: [PATCH/RFC 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
From: Kjetil Barvik @ 2009-01-06 12:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds
In-Reply-To: <7vk598j17u.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Kjetil Barvik <barvik@broadpark.no> writes:
>
>> Start using the optimised, faster and more effective symlink/directory
>> cache.  The previously used call:
>>
>>    has_symlink_leading_path(len, name);
>>
>> should be identically with the following call to lstat_cache():
>>
>>    lstat_cache(len, name,
>>                LSTAT_SYMLINK|LSTAT_DIR,
>>                LSTAT_SYMLINK);
>> ...
>
> Care to enlighten why some of callers use the above, but not others?
> Namely, check_removed() in diff-lib.c 

  I though that first it would be a good thing to introduce as little
  changes as possible, and then later on do some cleanups.  Regarding
  the 'check_removed()' function, I though that it could later have been
  written something like this after cleanups:

[....]
static int check_removed(const struct cache_entry *ce, struct stat *st)
{
	unsigned int ret_flags =
		check_lstat_cache(ce_namelen(ce), ce->name,
				  LSTAT_SYMLINK|LSTAT_NOTDIR|LSTAT_DIR);

	if (ret_flags & (LSTAT_SYMLINK|LSTAT_NOTDIR))
		return 1;
	if (ret_flags & LSTAT_LSTATERR)
		return -1;

	if (ret_flags & LSTAT_DIR) {
		unsigned char sub[20];
[....]

  This would have saved one more lstat() call in some cases.  But after
  a test, I now see that it does not work.  The reason is that it does
  not set the 'struct stat *st' parameter and/or that for the moment you
  can not tell the 'lstat_cache()' function to also always test the last
  path component.  It could be extended to do this, if someone ask for
  it and if it would be useful to extend the lstat_cache() for this
  fact.

  I will remove the '|LSTAT_NOTDIR' part from the call to lstat_cache()
  in 'check_removed()' in the next version of the patch.

> and callers in unpack-trees.c care about NOTDIR unlike others, even
> though the original code checked for exactly the same condition.

  Regarding the 'verify_absent()' function in unpack-trees.c, the
  '|LSTAT_NOTDIR' part of the call to lstat_cache() helps to avoid 16047
  lstat() calls for the given test case mentioned in the cover-letter.
  And from the source code:

  [...]
	if (lstat_cache(ce_namelen(ce), ce->name,
			LSTAT_SYMLINK|LSTAT_NOTDIR|LSTAT_DIR,
			LSTAT_SYMLINK|LSTAT_NOTDIR))
		return 0;

	if (!lstat(ce->name, &st)) {
  [...]

  it should be easy to see that if we from the lstat_cache() could
  already spot that some path component of ce->name does not exists,
  then we can avoid the following lstat() call, as it then should known
  to be failing.

  regarding the 'unlink_entry()' function in unpack-trees.c, the
  '|LSTAT_NOTDIR' part of the call to lstat_cache() does not for the
  moment helps to avoid any lstat() calls, as far as I can see.  But,
  again, from the source code:

  [..]
	char *name = ce->name;

	if (lstat_cache(ce_namelen(ce), ce->name,
			LSTAT_SYMLINK|LSTAT_NOTDIR|LSTAT_DIR,
			LSTAT_SYMLINK|LSTAT_NOTDIR))
		return;
	if (unlink(name))
		return;
  [...]

  it should be correct, since if we already know that some path
  component of ce-name does not exist, the call to unlink(name) would
  always fail (with ENOENT).

> Does this mean that some callers of has_symlink_leading_path() checked
> only for leading symlinks when they should also have checked for a leading
> non-directory, and this patch is also a bugfix?

  Yes, as indicated above, has_symlink_leading_path() should have
  checked for leading non-directories when called from for the
  'verify_absent()' function to be able to optimise away some more
  lstat() calls.

  I admit that I do not know the source code good enough to decide if
  this is an indication of a bug somewhere, or just an optimisation.

  -- kjetil

^ permalink raw reply

* Re: Migration problems from SVN
From: Thomas Rast @ 2009-01-06 12:39 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
In-Reply-To: <c09652430901060409p23d2737ck6e41b3f8f1eaf01@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

Øyvind Harboe wrote:
> If I early on and accidentally commit a *large* binary object, how do I get rid
> of it from .git again?

git-filter-branch can do that.  See the top of the EXAMPLES section in

  http://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html

This requires a rewrite of all history affected; make sure you
understand the implications if you have already published it.  See the
DISCUSSION in the same manpage.

Also note that due to the distributed nature, you cannot force such
changed history upon anyone; i.e., you cannot remove the objects from
anyone else's repository.  You can only ask them to accept your new
history and forget that the old one was ever there.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Migration problems from SVN
From: Øyvind Harboe @ 2009-01-06 12:09 UTC (permalink / raw)
  To: git

Hi all,

I'm trying to migrate from svn to git, but has run into a snag for which
I haven't found any good solutions on the web:

If I early on and accidentally commit a *large* binary object, how do I get rid
of it from .git again?

svn copes with this reasonably well as the binary object will no longer be
downloaded if it is deleted. Also when converting old SVN repositories,
such objects needs to be pruned.  Somewhat related I also need to enumerate
the offending large binary objects that I should get rid of.

It is unpractical to live by the rule that no big objects should ever
be committed
accidentally given the number of people involved and the walks of life they come
from.... So there really needs to be some way to deal with this once the
damage has been done.

-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Sitaram Chamarty @ 2009-01-06 11:57 UTC (permalink / raw)
  To: git
In-Reply-To: <20090106080300.GA10079@coredump.intra.peff.net>

On 2009-01-06, Jeff King <peff@peff.net> wrote:
> If you are going to have multiple users sharing a repository, generally
> they should be in the same group and the core.sharedrepository config
> option should be set (see "git help config", or the "shared" option to
> git-init).
>
> I've never used that personally, though. I have always just used POSIX
> ACLs, with a default ACL on each directory giving access to everyone.
> E.g. (off the top of my head):
>
>   for user in user1 user2 user3; do
>     setfacl -R -m u:$user:rwX -m d:u:$user:rwX /path/to/repo
>   done

If you're not worried about the finer-grained access control
that acl(5) gives you, just do what "git init
--shared=group" does:

    git config core.sharedrepository 1 # as mentioned above
    chmod g+ws .git

Now set the group to something (I use "gitpushers" ;-)

    chgrp -R gitpushers .git

amd make sure all your users are part of that group.

Works fine for small teams...

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Pierre Habouzit @ 2009-01-06 11:39 UTC (permalink / raw)
  To: Johannes Schindelin, Linus Torvalds; +Cc: davidel, Francis Galiegue, Git ML
In-Reply-To: <20090106111712.GB30766@artemis.corp>


[-- Attachment #1.1: Type: text/plain, Size: 2784 bytes --]

On Tue, Jan 06, 2009 at 11:17:12AM +0000, Pierre Habouzit wrote:
> On jeu, jan 01, 2009 at 04:38:09 +0000, Johannes Schindelin wrote:
> > 
> > Nothing fancy, really, just a straight-forward implementation of the
> > heavily under-documented and under-analyzed paience diff algorithm.
> 
> Btw, what is the status of this series ? I see it neither in pu nor in
> next. And I would gladly see it included in git.

Johannes: I've not had time to investigate, but when finding what I
present in the end of this mail, I ran:

    git log -p > log-normal
    git log -p --patience > log-patience

in git.git.

I saw that patience diff is slower than normal diff, which is expected,
but I had to kill the latter command because it leaks like hell. I've
not investigated yet (And yes I'm running your latest posted series).

> On jeu, jan 01, 2009 at 07:45:21 +0000, Linus Torvalds wrote:
> > On Thu, 1 Jan 2009, Johannes Schindelin wrote:
> > > 
> > > Nothing fancy, really, just a straight-forward implementation of the
> > > heavily under-documented and under-analyzed paience diff algorithm.
> > 
> > Exactly because the patience diff is so under-documented, could you 
> > perhaps give a few examples of how it differs in the result, and why it's 
> > so wonderful? Yes, yes, I can google, and no, no, nothing useful shows up 
> > except for *totally* content-free fanboisms. 
> > 
> > So could we have some actual real data on it?

> I've checked in many projects I have under git, the differences between
> git log -p and git log -p --patience. The patience algorithm is really
> really more readable with it involves code moves with changes in the
> moved sections. If the section you move across is smaller than the moved
> ones, the patience algorithm will show the moved code as removed where
> it was and added where it now is, changes included. The current diff
> will rather move the smaller invariend section you move across and
> present mangled diffs involving the function prototypes making it less
> than readable.

Actually git.git has a canonical example of this in 214a34d22.

For those not having the --patience diff applied locally, attached are
the two patches git show / git show --patience give. It's of course a
matter of taste, but I like the patience version a lot more.

I'm also curious to see what a merge conflict with such a move would
look like (e.g. inverting some of the added arguments to the factorized
function of 214a34d22 or something similar). I'm somehow convinced that
it would generate a nicer conflict somehow.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #1.2: 214a34d22.normal --]
[-- Type: text/plain, Size: 2323 bytes --]

commit 214a34d22ec59ec7e1166772f06ecf8799f96c96
Author: Florian Weimer <fw@deneb.enyo.de>
Date:   Sun Aug 31 17:45:04 2008 +0200

    git-svn: Introduce SVN::Git::Editor::_chg_file_get_blob
    
    Signed-off-by: Florian Weimer <fw@deneb.enyo.de>
    Acked-by: Eric Wong <normalperson@yhbt.net>

diff --git a/git-svn.perl b/git-svn.perl
index 0479f41..2c3e13f 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3663,28 +3663,35 @@ sub change_file_prop {
 	$self->SUPER::change_file_prop($fbat, $pname, $pval, $self->{pool});
 }
 
-sub chg_file {
-	my ($self, $fbat, $m) = @_;
-	if ($m->{mode_b} =~ /755$/ && $m->{mode_a} !~ /755$/) {
-		$self->change_file_prop($fbat,'svn:executable','*');
-	} elsif ($m->{mode_b} !~ /755$/ && $m->{mode_a} =~ /755$/) {
-		$self->change_file_prop($fbat,'svn:executable',undef);
-	}
-	my $fh = Git::temp_acquire('git_blob');
-	if ($m->{mode_b} =~ /^120/) {
+sub _chg_file_get_blob ($$$$) {
+	my ($self, $fbat, $m, $which) = @_;
+	my $fh = Git::temp_acquire("git_blob_$which");
+	if ($m->{"mode_$which"} =~ /^120/) {
 		print $fh 'link ' or croak $!;
 		$self->change_file_prop($fbat,'svn:special','*');
-	} elsif ($m->{mode_a} =~ /^120/ && $m->{mode_b} !~ /^120/) {
+	} elsif ($m->{mode_a} =~ /^120/ && $m->{"mode_$which"} !~ /^120/) {
 		$self->change_file_prop($fbat,'svn:special',undef);
 	}
-	my $size = $::_repository->cat_blob($m->{sha1_b}, $fh);
-	croak "Failed to read object $m->{sha1_b}" if ($size < 0);
+	my $blob = $m->{"sha1_$which"};
+	return ($fh,) if ($blob =~ /^0{40}$/);
+	my $size = $::_repository->cat_blob($blob, $fh);
+	croak "Failed to read object $blob" if ($size < 0);
 	$fh->flush == 0 or croak $!;
 	seek $fh, 0, 0 or croak $!;
 
 	my $exp = ::md5sum($fh);
 	seek $fh, 0, 0 or croak $!;
+	return ($fh, $exp);
+}
 
+sub chg_file {
+	my ($self, $fbat, $m) = @_;
+	if ($m->{mode_b} =~ /755$/ && $m->{mode_a} !~ /755$/) {
+		$self->change_file_prop($fbat,'svn:executable','*');
+	} elsif ($m->{mode_b} !~ /755$/ && $m->{mode_a} =~ /755$/) {
+		$self->change_file_prop($fbat,'svn:executable',undef);
+	}
+	my ($fh, $exp) = _chg_file_get_blob $self, $fbat, $m, 'b';
 	my $pool = SVN::Pool->new;
 	my $atd = $self->apply_textdelta($fbat, undef, $pool);
 	my $got = SVN::TxDelta::send_stream($fh, @$atd, $pool);

[-- Attachment #1.3: 214a34d22.patience --]
[-- Type: text/plain, Size: 2290 bytes --]

commit 214a34d22ec59ec7e1166772f06ecf8799f96c96
Author: Florian Weimer <fw@deneb.enyo.de>
Date:   Sun Aug 31 17:45:04 2008 +0200

    git-svn: Introduce SVN::Git::Editor::_chg_file_get_blob
    
    Signed-off-by: Florian Weimer <fw@deneb.enyo.de>
    Acked-by: Eric Wong <normalperson@yhbt.net>

diff --git a/git-svn.perl b/git-svn.perl
index 0479f41..2c3e13f 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3663,6 +3663,27 @@ sub change_file_prop {
 	$self->SUPER::change_file_prop($fbat, $pname, $pval, $self->{pool});
 }
 
+sub _chg_file_get_blob ($$$$) {
+	my ($self, $fbat, $m, $which) = @_;
+	my $fh = Git::temp_acquire("git_blob_$which");
+	if ($m->{"mode_$which"} =~ /^120/) {
+		print $fh 'link ' or croak $!;
+		$self->change_file_prop($fbat,'svn:special','*');
+	} elsif ($m->{mode_a} =~ /^120/ && $m->{"mode_$which"} !~ /^120/) {
+		$self->change_file_prop($fbat,'svn:special',undef);
+	}
+	my $blob = $m->{"sha1_$which"};
+	return ($fh,) if ($blob =~ /^0{40}$/);
+	my $size = $::_repository->cat_blob($blob, $fh);
+	croak "Failed to read object $blob" if ($size < 0);
+	$fh->flush == 0 or croak $!;
+	seek $fh, 0, 0 or croak $!;
+
+	my $exp = ::md5sum($fh);
+	seek $fh, 0, 0 or croak $!;
+	return ($fh, $exp);
+}
+
 sub chg_file {
 	my ($self, $fbat, $m) = @_;
 	if ($m->{mode_b} =~ /755$/ && $m->{mode_a} !~ /755$/) {
@@ -3670,21 +3691,7 @@ sub chg_file {
 	} elsif ($m->{mode_b} !~ /755$/ && $m->{mode_a} =~ /755$/) {
 		$self->change_file_prop($fbat,'svn:executable',undef);
 	}
-	my $fh = Git::temp_acquire('git_blob');
-	if ($m->{mode_b} =~ /^120/) {
-		print $fh 'link ' or croak $!;
-		$self->change_file_prop($fbat,'svn:special','*');
-	} elsif ($m->{mode_a} =~ /^120/ && $m->{mode_b} !~ /^120/) {
-		$self->change_file_prop($fbat,'svn:special',undef);
-	}
-	my $size = $::_repository->cat_blob($m->{sha1_b}, $fh);
-	croak "Failed to read object $m->{sha1_b}" if ($size < 0);
-	$fh->flush == 0 or croak $!;
-	seek $fh, 0, 0 or croak $!;
-
-	my $exp = ::md5sum($fh);
-	seek $fh, 0, 0 or croak $!;
-
+	my ($fh, $exp) = _chg_file_get_blob $self, $fbat, $m, 'b';
 	my $pool = SVN::Pool->new;
 	my $atd = $self->apply_textdelta($fbat, undef, $pool);
 	my $got = SVN::TxDelta::send_stream($fh, @$atd, $pool);

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply related

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Pierre Habouzit @ 2009-01-06 11:17 UTC (permalink / raw)
  To: Johannes Schindelin, Linus Torvalds; +Cc: davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901011730190.30769@pacific.mpi-cbg.de>

[-- Attachment #1: Type: text/plain, Size: 3580 bytes --]

On jeu, jan 01, 2009 at 04:38:09 +0000, Johannes Schindelin wrote:
> 
> Nothing fancy, really, just a straight-forward implementation of the
> heavily under-documented and under-analyzed paience diff algorithm.

Btw, what is the status of this series ? I see it neither in pu nor in
next. And I would gladly see it included in git.

On jeu, jan 01, 2009 at 07:45:21 +0000, Linus Torvalds wrote:
> 
> 
> On Thu, 1 Jan 2009, Johannes Schindelin wrote:
> > 
> > Nothing fancy, really, just a straight-forward implementation of the
> > heavily under-documented and under-analyzed paience diff algorithm.
> 
> Exactly because the patience diff is so under-documented, could you 
> perhaps give a few examples of how it differs in the result, and why it's 
> so wonderful? Yes, yes, I can google, and no, no, nothing useful shows up 
> except for *totally* content-free fanboisms. 
> 
> So could we have some actual real data on it?

For example, look at the following (shamelessly stolen from the real
life example http://glandium.org/blog/?p=120).

With orig.mk reading:
  ] include $(DEPTH)/config/autoconf.mk
  ] 
  ] include $(topsrcdir)/config/rules.mk
  ] 
  ] libs::
  ]        $(INSTALL) $(srcdir)/nsKillAll.js $(DIST)/bin/components
  ] 
  ] clean::
  ]        rm -f $(DIST)/bin/components/nsKillAll.js

With patched.mk reading:
  ] include $(DEPTH)/config/autoconf.mk
  ] 
  ] EXTRA_COMPONENTS = nsKillAll.js
  ] 
  ] include $(topsrcdir)/config/rules.mk
  ] 
  ] clean::
  ]        rm -f $(DIST)/bin/components/nsKillAll.js

You get:

Normal git diff                                    | patience diff
---------------------------------------------------+------------------------------------------
 include $(DEPTH)/config/autoconf.mk               |  include $(DEPTH)/config/autoconf.mk
                                                   |
-include $(topsrcdir)/config/rules.mk              | +EXTRA_COMPONENTS = nsKillAll.js
+EXTRA_COMPONENTS = nsKillAll.js                   | +
                                                   |  include $(topsrcdir)/config/rules.mk
-libs::                                            |
-       $(INSTALL) $(srcdir)/...                   | -libs::
+include $(topsrcdir)/config/rules.mk              | -       $(INSTALL) $(srcdir)/...
                                                   | -
 clean::                                           |  clean::
        rm -f $(DIST)/bin/components/nsKillAll.js  |         rm -f $(DIST)/bin/components/nsKillAll.js



I've checked in many projects I have under git, the differences between
git log -p and git log -p --patience. The patience algorithm is really
really more readable with it involves code moves with changes in the
moved sections. If the section you move across is smaller than the moved
ones, the patience algorithm will show the moved code as removed where
it was and added where it now is, changes included. The current diff
will rather move the smaller invariend section you move across and
present mangled diffs involving the function prototypes making it less
than readable.

Of course, moving code _and_ modifying it at the same time is rather bad
style. BUt (1) it happens and the fact it's bad style is a bad argument
to not show a decent diff of it (2) if you construct diffs of a range of
commits it happens a lot.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: JGit vs. Git
From: Robin Rosenberg @ 2009-01-06 11:12 UTC (permalink / raw)
  To: Vagmi Mudumbai; +Cc: git
In-Reply-To: <a55cfe9d0901052250k2be203dfvb0b437a523f2cecc@mail.gmail.com>

tisdag 06 januari 2009 07:50:11 skrev Vagmi Mudumbai:
> Hi,
> 
> I am a git and a jgit noob. Be gentle. :-)
> 
> 1) Is JGit a drop in replacement of Git? In sense, if I were to pack
> in an SCM with my app, Can I pack jgit instead of C Git?
Short answer: No. JGit is a library for implementing access to a Git
repository. It does not have nearly all functionality and the command
line interface is mostly for testing the internals and so it only has
what's needed for that purpose or because we like to. The command
set will become more complete over time.
> 
> 2) I noticed that there are no 'add' and 'commit' commands (at least
> from the source) in the org.spearce.git.pgm project. I am looking at
> the repo.or.cz/egit.git repo. I had a brief look at the
> lib/GitIndex.java and lib/Repository.java. GitIndex has the add
> methods to add files/entries to the index. I am still stumped on how
> commits can be done with JGit. Any help is hugely appreciated.

With the Eclipse plugin. If you want to implement the add and commit
commands, please do. You won't need much code for it. The class that
does adding in Eclipse is called Track, btw. You can also take a look at
unit tests for how to do things and the other commands for how to
write commands. It's very easy actually. Then submit your patch.

What is most apparently missing from the internals is handling patches.
We have patch reading (but not application), now, plus candidate for
creating diffs (it works, but needs to be libified, byte[]-ified and we want
to make it faster).

> I am working on Windows with msysGit behind a HTTP Proxy. (Life cant
> get worse, I guess.) . I planned on using grit via JRuby but grit uses
> fork which is not available on funny platforms like windows. And JRuby
> guys do not have any plan on supporting fork even on platforms on
> which for is supported. If JGit is a pure Java based implementation of
fork isn't really supported on Windows. Cygwin goes to great lengths to
emulate it. Trying to do that within the context of an arbitrary JVM seems
like a daunting task. Consider submitting patches to make grit not use fork...
just kidding.., please help us improve JGit instead :)

> Git with more or less the same functionality, then my work becomes a
> lot easier.

The intention is to be able to do anything useful with JGit too.

-- robin

^ permalink raw reply

* Re: [PATCH] parse-opt: migrate builtin-ls-files.
From: Pierre Habouzit @ 2009-01-06 10:22 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Junio C Hamano, git
In-Reply-To: <1231200669-30684-1-git-send-email-vmiklos@frugalware.org>

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

On Tue, Jan 06, 2009 at 12:11:09AM +0000, Miklos Vajna wrote:

> +static int option_parse_no_empty(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	struct dir_struct *dir = opt->value;
> +
> +	dir->hide_empty_directories = 1;
> +
> +	return 0;
> +}

Should be option_parse_empty and deal with "unset" to know if `no-` was
prefixed to it or not.

> +		{ OPTION_CALLBACK, 0, "no-empty-directory", &dir, NULL,
> +			"do not list empty directories",

This should be "empty-directory" and "list empty directories as well"

I've not checked if you could also check more of the "unsets" things in
your callbacks as well btw, but it looks like it could.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH/RFC 1/4] Optimised, faster, more effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-06  9:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds
In-Reply-To: <7v3afwizi7.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:
 <snipp>
> If everything is working correctly, I do not think you will ever have a
> situation where you check "A/B" here and get ENOTDIR back.  lstat("A/B")
> would yield ENOTDIR if "A" is not a directory, but:
>
>  (1) If you did test "A" in the earlier round in the loop, you would have
>      already found it is not a directory and would have exited the loop
>      with LSTAT_ERR.  You cannot test "A/B" in such a case;

  No, if the test lstat("A") already returned -1 it would then have set
  errno to ENOTDIR in this case, I think.  So in this case the cache
  would have only stored "A".

>  (2) If you did not test "A" in the loop, that has to be because you had a
>      cached information for it, and the caching logic would have returned
>      early if "A" was a non-directory without entering the loop; in other
>      words, you would test "A/B" here without testing "A" in the loop only
>      when the cache says "A" was a directory.  You cannot get ENOTDIR in
>      such a case.
>
> In any case, my main puzzlement still stands.  I think ENOTDIR case should
> behave differently from ENOENT case.
>
> And I think it is an indication of a grave error, either this code is
> racing with an external process that is actively mucking with the work
> tree to make your cached information unusable, or somebody forgot to call
> clear_lstat_cache().
>
> Hmm?

  I think that you do not see that the when the lstat() function returns
  errno =

    ENOENT   A component of the path path does not exist, or the path is
             an empty string.
    ENOTDIR  A component of the path is not a directory.

  the cache logic will/must assume that the component we got an error on
  will/must always be the _last_ component in the path.

> I take 1/4 of the above back.

  Will this means that I do not have to update the patch to version 2?

  Note, one thing what I personally do not like with this patch so far,
  is that it makes a new file, and then it kinda hides the comments from
  Linus which talks aboth extending the logic.  It would maybe be better
  to have reused the same file (symlinks.c), but then the function name
  ('lstat_cache()') and the filename would not match.

  -- kjetil

^ permalink raw reply

* Fwd: error: Unable to append to logs/refs/heads/master:
From: Tobin Harris @ 2009-01-06  9:44 UTC (permalink / raw)
  To: git
In-Reply-To: <9a081f0e0901060138g7cd3d92cvc061b7abf6ad6df7@mail.gmail.com>

Hi folks

I'm experiencing a weird error that I can't seem to get around...

$ git push
Counting objects: 67, done.
Compressing objects: 100% (39/39), done.
Writing objects: 100% (43/43), 6.20 KiB, done.
Total 43 (delta 33), reused 0 (delta 0)
Unpacking objects: 100% (43/43), done.
error: Unable to append to logs/refs/heads/master: Invalid argument
To /z/RemoteGit/Zcms.git
 ! [remote rejected] master -> master (failed to write)
error: failed to push some refs to '/z/RemoteGit/Zcms.git'

This usually works fine, so I'm not sure what's happend.

My config looks like this

[core]
        repositoryformatversion = 0
        filemode = false
        bare = false
        logallrefupdates = true
        symlinks = false
        ignorecase = true
[remote "origin"]
        url = /z/RemoteGit/Zcms.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
        merge = refs/heads/master

Notes: Using with Cygwin + Windows XP, bare repos is on a network
drive.

Any help appreciated.

Tobin

^ permalink raw reply

* Re: [PATCH] Documentation/merge-strategies.txt, diff-options: grammar
From: Jeff King @ 2009-01-06  9:38 UTC (permalink / raw)
  To: Thomas Rast; +Cc: jidanni, git, gitster
In-Reply-To: <200901060944.09435.trast@student.ethz.ch>

On Tue, Jan 06, 2009 at 09:44:06AM +0100, Thomas Rast wrote:

> jidanni@jidanni.org wrote:
> > @@ -11,7 +11,7 @@ resolve::
> >  recursive::
> >  	This can only resolve two heads using 3-way merge
> >  	algorithm.  When there are more than one common
> > -	ancestors that can be used for 3-way merge, it creates a
> > +	ancestor that can be used for 3-way merge, it creates a
> >  	merged tree of the common ancestors and uses that as
> >  	the reference tree for the 3-way merge.  This has been
> >  	reported to result in fewer merge conflicts without
> 
> I'm not a native speaker, but shouldn't this be worded
> 
>   When there _is_ more than one common _ancestor_ that ...
> 
> instead?

I am, and yes.

-Peff

^ permalink raw reply

* Re: [PATCH/RFC 1/4] Optimised, faster, more effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-06  9:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds
In-Reply-To: <7vprj0j181.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Kjetil Barvik <barvik@broadpark.no> writes:
>
>> +static inline void
>> +update_path_cache(unsigned int ret_flags, unsigned int track_flags,
>> +		  int last_slash)
>> +{
>> +	/* Max 3 different path types can be cached for the moment! */
>> +	unsigned int save_flags =
>> +		ret_flags & track_flags & (LSTAT_DIR|
>> +					   LSTAT_NOTDIR|
>> +					   LSTAT_SYMLINK);
>> +	if (save_flags && last_slash > 0 && last_slash < PATH_MAX) {
>> +		cache_flags = save_flags;
>> +		cache_len   = last_slash;
>> +	} else {
>> +		cache_flags = 0;
>> +		cache_len   = 0;
>> +	}
>> +}
>
> I personally found this inline function with a single call site
> distracting in following the logic.  It does not make the indentation
> level shallower, either.  Also, the else part should probably call
> clear_lstat_cache() to protect it from possible future enhancements to add
> more state variables.

  Ok, I will remove the function and update the else-part.

>> +
>> +/*
>> + * Check if name 'name' of length 'len' has a symlink leading
>> + * component, or if the directory exists and is real, or not.
>> + *
>> + * To speed up the check, some information is allowed to be cached.
>> + * This is indicated by the 'track_flags' argument.
>> + */
>> +unsigned int
>> +check_lstat_cache(int len, const char *name, unsigned int track_flags)
>> +{
>> +	int match_len, last_slash, max_len;
>> +	unsigned int match_flags, ret_flags;
>> +	struct stat st;
>> +
>> +	/* Check if match from the cache for 2 "excluding" path types.
>> +	 */
>> +	match_len = last_slash =
>> +		greatest_common_path_cache_prefix(len, name);
>> +	match_flags =
>> +		cache_flags & track_flags & (LSTAT_NOTDIR|
>> +					     LSTAT_SYMLINK);
>> +	if (match_flags && match_len == cache_len)
>> +		return match_flags;
>
> Let me see if I understand the logic behind this caching.  When you have
> checked A/B/C earlier and you already know B is a symlink, you remember
> that A/B was a symlink..  You can fill a request to check A/B/$whatever
> (assuming A/B does not change --- otherwise the caller should clear the
> cache) from the cached data, because no matter what $whatever is, it will
> result in the same "has-leading-symlink".
>
> Similarly, if you know A/B is not a directory from an earlier test, you
> know that a request to check A/B/$whatever will result in the same ENOTDIR
> no matter what $whatever is, so you can return early.
>
> The above "return match_flags" will not trigger if the cached path does
> not have any leading symlink.  So we know the matched part are all good
> directories when we start lstat() loop.
>
> Am I following you so far?

  Yes you do!

>> +	/* Okay, no match from the cache so far, so now we have to
>> +	 * check the rest of the path components and update the cache.
>> +	 */
>> +	ret_flags = LSTAT_DIR;
>> +	max_len = len < PATH_MAX ? len : PATH_MAX;
>> +	while (match_len < max_len) {
>> +		do {
>> +			cache_path[match_len] = name[match_len];
>> +			match_len++;
>> +		} while (match_len < max_len && name[match_len] != '/');
>
> You take one component from the input, and append it to the part that is
> already known to be true directory (i.e. cached part and the part earlier
> iteration of the loop checked so far), to be tested by lstat()...
>
>> +		if (match_len >= max_len)
>> +			break;
>
> ... but you are not interested in the full input.  

  If the lengt of name is larger than PATH_MAX all lstat() calls would
  fail with an ENAMETOOLONG error, at least on my Linux box, so I
  thought that it was not nessarry to test further.  But maybe we should
  emdiatly return an error if name is too long?

> We are only checking the leading path (e.g. check for "A/B/C" may
> lstat() "A", "A/B" but not "A/B/C").

  That is correct, and it is the same logic as in the
  has_symlink_leading_path() function.

>> +		last_slash = match_len;
>> +		cache_path[last_slash] = '\0';
>> +
>> +		if (lstat(cache_path, &st)) {
>> +			ret_flags = LSTAT_LSTATERR;
>> +			if (errno == ENOENT || errno == ENOTDIR)
>> +				ret_flags |= LSTAT_NOTDIR;
>
> If you tested "A/B" here and got ENOENT back, you know "A/B" does not
> exist; you cache this knowledge as "A/B is not a directory" 

  Correct.

> (I also think you could use it as a cached knowledge that "A exists
> and is a directory".  I am not sure if you are taking advantage of
> that).

  It does take adavantage of this fact.  It will also do simmilar things
  with a symlink cached path.

> What I do not understand about this code is the ENOTDIR case.  If you
> tested "A/B" and got ENOTDIR back, what you know is that "A" is not a
> directory (if the path tested this round were deeper like "X/Y/A/B", you
> know "X/Y/A" is not a directory, and you know "X" and "X/Y" are true
> directories; otherwise the loop would have exited before this round when
> you tested "X" or "X/Y" in the earlier rounds).

  Since the cache is supoosed to start from a known existing directory,
  and is testing each path component when the lstat("A/B") calls returns
  ENOTDIR, we should know the fact that the directory "A" exists, and
  that "A/B/" does not exists.
  
> So as far as I can think of, ENOENT case and ENOTDIR case would give you
> different information (ENOENT would say "A is a dir, A/B is not"; ENOTDIR
> would say "A is not a dir").  I am confused how you can cache the same
> path and same flag between these two cases here.

  I admit that I used copy-and-paste from other similar test's from the
  sourcecode:

kjetil@localhost ~/git/git $ grep -Hn ENOENT.*ENOTDIR *.c
builtin-apply.c:2364:	else if ((errno != ENOENT) && (errno != ENOTDIR))
builtin-update-index.c:74: *  - missing file (ENOENT or ENOTDIR). That's ok if we're
builtin-update-index.c:81:	if (err == ENOENT || err == ENOTDIR)
diff-lib.c:30:		if (errno != ENOENT && errno != ENOTDIR)
lstat_cache.c:81:			if (errno == ENOENT || errno == ENOTDIR)
setup.c:191:	if (errno != ENOENT && errno != ENOTDIR)

  From the 'man lstat' page I read the following for this 2 error codes:

    ENOENT    A component of the path _path_ does not exist, or the path is an empty string.
    ENOTDIR   A component of the path is not a directory.

  I would have guessed that for what we is looking for, it would be
  correct to threat both these error codes as the same.  Is this wrong?

  -- kjetil

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Johnny Lee @ 2009-01-06  9:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vy6xohkt2.fsf@gitster.siamese.dyndns.org>

Thanks for your consideration Junio,

Peff has already helped me to figure out the root cause of this error
is a possible bad practice on collaboration work.

Here I attached the previous mail.

>>>>>>
Thanks Peff, I've checked the permission of .git/objects/16, it's
created by another user and thus I have no permission to remove it.

In fact, this is coming from a previous bad practice on setting up a
collaboration repository on a SSH server, here is what I've done so
far:
1. I have a server tomato, which has two users: "git" and "johnny"

2. User "git" has cloned a repository from elsewhere as ~git/golf/.git

3. User "johnny" has cloned this repository from git on another
machine, using command:
git clone johnny@tomato:~git/golf/.git localgit

4. It works fine, and user "johnny" has made some local commits and
wants to push the changes to git's repository, so he uses:
git push

5. But it's reported some errors about permissions, like:
error: Unable to append to logs/refs/heads/cupcake: Permission denied
error: Unable to append to logs/refs/remotes/origin/cupcake: Permission denied
error: Unable to append to logs/refs/remotes/origin/cupcake: Permission denied
ng refs/heads/cupcake failed to write
ng refs/remotes/origin/HEAD failed to write
ng refs/remotes/origin/cupcake failed to write
error: failed to push to 'johnny@tomato:~git/golf/.git'

6. That's normal, since the ~git/golf/.git is created by user "git",
and user "johnny" can't write it.

7. Then the user "git" has changed mode for all the files under .git
to writable.

8. This time, user "johnny" can push successfully.

9. But as you can see here, some files are created by user "johnny",
and thus user "git" can't write them.

It seems not a good practice for collaboration work, would you please
share your experiences with us?

Thanks very much and regards,
Johnny

On Tue, Jan 6, 2009 at 4:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
> "Johnny Lee" <johnnylee194@gmail.com> writes:
>
>> While I'm looking at these "unable to unlink" files, it seems they are
>> read only:
>> git@tomato:~/golf$ ls -l .git/objects/16/
>> total 4
>> -r--r--r-- 1 johnny johnny 26 2009-01-05 09:25
>> b14f5da9e2fcd6f3f38cc9e584cef2f3c90ebe
>
>
> Check "ls -ld .git/objects/16" and see if it is owned by you and writable
> by you.
>
> In sane POSIX filesystem semantics, it should not pose with problem with
> respect to removal that a file is unwritable.
>
> What counts is the writability of the parent directory.
>

-- 
we all have our crosses to bear

^ permalink raw reply

* Re: [ANNOUNCE] Git homepage change
From: Adeodato Simó @ 2009-01-06  9:07 UTC (permalink / raw)
  To: Scott Chacon; +Cc: git
In-Reply-To: <d411cc4a0901051749p2ef880bub45bba1c0d41bfc7@mail.gmail.com>

* Scott Chacon [Mon, 05 Jan 2009 17:49:01 -0800]:

> Also, Miklos : your patch changes the format of the output and doesn't
> speed things up or anything, so I'm just sticking with the current
> script for now (unless I'm missing something).

-8<-
git log --pretty=short --no-merges | git shortlog -n | grep -v -e '^ ' | grep ':' > ../gitscm/config/authors.txt

    data = author.split(' ')
    number = data.pop.gsub('(', '').gsub(')', '').chomp
    name = data.join(' ')
-8<-

vs

-8<- v2 -8<-
git shortlog --no-merges -sn > ../gitscm/config/authors.txt

    if author =~ /(\d+)\s+(.+)/
        name, number = $2, $1
-8<-

HTH.

P.S.: And it is a tad faster, yes (not that that should be the primary
criteria).

-- 
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
The true teacher defends his pupils against his own personal influence.
                -- Amos Bronson Alcott

^ permalink raw reply

* Re: Error: unable to unlink ... when using "git gc"
From: Junio C Hamano @ 2009-01-06  8:59 UTC (permalink / raw)
  To: Johnny Lee; +Cc: git
In-Reply-To: <488807870901052300y57f59b90rdc03cc47c790b416@mail.gmail.com>

"Johnny Lee" <johnnylee194@gmail.com> writes:

> While I'm looking at these "unable to unlink" files, it seems they are
> read only:
> git@tomato:~/golf$ ls -l .git/objects/16/
> total 4
> -r--r--r-- 1 johnny johnny 26 2009-01-05 09:25
> b14f5da9e2fcd6f3f38cc9e584cef2f3c90ebe

Check "ls -ld .git/objects/16" and see if it is owned by you and writable
by you.

In sane POSIX filesystem semantics, it should not pose with problem with
respect to removal that a file is unwritable.

What counts is the writability of the parent directory.

^ permalink raw reply

* Re: [PATCH] Documentation/gitcore-tutorial.txt: HEAD not symlink these days
From: Miklos Vajna @ 2009-01-06  8:57 UTC (permalink / raw)
  To: jidanni; +Cc: git, gitster
In-Reply-To: <87ljtpnoag.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 330 bytes --]

On Tue, Jan 06, 2009 at 10:47:35AM +0800, jidanni@jidanni.org wrote:
> OK please change it for me, as today is my junior patching practice
> day, and a patch to a patch is way over my head.

I can, but in the longer term, I think it's better if you learn to use
git rebase -i / git commit --amend. You can do it tomorrow as well.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH/RFC 1/4] Optimised, faster, more effective symlink/directory detection
From: Junio C Hamano @ 2009-01-06  8:56 UTC (permalink / raw)
  To: Kjetil Barvik; +Cc: git, Linus Torvalds
In-Reply-To: <7vprj0j181.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Let me see if I understand the logic behind this caching....
> ...
>> +		last_slash = match_len;
>> +		cache_path[last_slash] = '\0';
>> +
>> +		if (lstat(cache_path, &st)) {
>> +			ret_flags = LSTAT_LSTATERR;
>> +			if (errno == ENOENT || errno == ENOTDIR)
>> +				ret_flags |= LSTAT_NOTDIR;
>
> If you tested "A/B" here and got ENOENT back, you know "A/B" does not
> exist; you cache this knowledge as "A/B is not a directory" (I also think
> you could use it as a cached knowledge that "A exists and is a directory".
> I am not sure if you are taking advantage of that).
>
> What I do not understand about this code is the ENOTDIR case.  If you
> tested "A/B" and got ENOTDIR back, what you know is that "A" is not a
> directory (if the path tested this round were deeper like "X/Y/A/B", you
> know "X/Y/A" is not a directory, and you know "X" and "X/Y" are true
> directories; otherwise the loop would have exited before this round when
> you tested "X" or "X/Y" in the earlier rounds).
>
> So as far as I can think of, ENOENT case and ENOTDIR case would give you
> different information (ENOENT would say "A is a dir, A/B is not"; ENOTDIR
> would say "A is not a dir").  I am confused how you can cache the same
> path and same flag between these two cases here.

I take 1/4 of the above back.

If everything is working correctly, I do not think you will ever have a
situation where you check "A/B" here and get ENOTDIR back.  lstat("A/B")
would yield ENOTDIR if "A" is not a directory, but:

 (1) If you did test "A" in the earlier round in the loop, you would have
     already found it is not a directory and would have exited the loop
     with LSTAT_ERR.  You cannot test "A/B" in such a case;

 (2) If you did not test "A" in the loop, that has to be because you had a
     cached information for it, and the caching logic would have returned
     early if "A" was a non-directory without entering the loop; in other
     words, you would test "A/B" here without testing "A" in the loop only
     when the cache says "A" was a directory.  You cannot get ENOTDIR in
     such a case.

In any case, my main puzzlement still stands.  I think ENOTDIR case should
behave differently from ENOENT case.

And I think it is an indication of a grave error, either this code is
racing with an external process that is actively mucking with the work
tree to make your cached information unusable, or somebody forgot to call
clear_lstat_cache().

Hmm?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox