Git development
 help / color / mirror / Atom feed
* Re: [PATCH] allow 8bit data in email body sent by send-email
From: Andre Przywara @ 2009-01-09 14:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Andre Przywara
In-Reply-To: <20090109072814.GA21180@coredump.intra.peff.net>

Jeff King wrote:
>> when sending patch files via git send-email, the perl script assumes
>> 7bit characters only. If there are other bytes in the body (foreign language
>> characters in names or translations), some servers (like vger.kernel.org)
>> reject the mail because of th?t. This patch always adds an 8bit header line
>> to each mail.
> 
> This should be done already by git-format-patch when you generate the
> patch to feed to send-email.
Well, this could be discussed, after all the problem lies in the actual 
transportation, which should be the responsibility of git-send-email. 
But I am OK with putting this into format-patch.
 > What exactly is the workflow you use to generate this problem?
I use git format-patch to generate a patch file for a single-mail patch 
(not a patch series). Then I edit this file manually to add questions 
and comments and include my signature. During this step the umlauts came 
in. If you have a suggestion to improve this workflow, I am all ears, I 
am fairly new to git.
> Does it matter where the non-ascii characters are
> (commit versus patch, etc)?
Oh, right you are. If there are 8bit characters in the commit message, 
git-format-patch adds the appropriate headers.
 > What version of git are you using?
Version 1.5.5 on one machine and 1.5.2.2 on another. I know, i know ;-) 
but I haven't had time to compile a newer one, yet.

> 
>> diff --git a/git-send-email.perl b/git-send-email.perl
>> index 77ca8fe..68a462c 100755
>> --- a/git-send-email.perl
>> +++ b/git-send-email.perl
>> @@ -793,6 +793,7 @@ To: $to${ccline}
>>  Subject: $subject
>>  Date: $date
>>  Message-Id: $message_id
>> +Content-Transfer-Encoding: 8bit
>>  X-Mailer: git-send-email $gitversion
>>  ";
> 
> This fix isn't right anyway. For one thing, if you're going to include
> C-T-E, you should also include a MIME-Version header. But more
> importantly, we are already handling encoding elsewhere. So
> unconditionally adding this means that you may conflict with existing
> MIME headers in the @xh variable.

Ok, so what about adding a flag to git-format-patch that forces the 8bit 
headers on? I think a workaround would be to add a --subject-prefix with 
a special character and later remove this, but this is not really a 
long-term solution ;-)

Thanks and regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply

* Re: [PATCH] allow 8bit data in email body sent by send-email
From: Jeff King @ 2009-01-09 14:44 UTC (permalink / raw)
  To: Andre Przywara; +Cc: git
In-Reply-To: <49675C38.8060208@amd.com>

On Fri, Jan 09, 2009 at 03:16:24PM +0100, Andre Przywara wrote:

>> This should be done already by git-format-patch when you generate the
>> patch to feed to send-email.
> Well, this could be discussed, after all the problem lies in the actual  
> transportation, which should be the responsibility of git-send-email. But 
> I am OK with putting this into format-patch.

I didn't mean "this functionality should go into format-patch" but
rather "this functionality is _already_ in format-patch, and it should
have been triggered".

The reason it has to be in format-patch is that only format-patch knows
what the correct encoding is. It's not that useful to just say "oh, this
is some 8-bit data." You also want to give a content-type header that
specifies the correct encoding. And anything that contains non-ascii
characters should come out of format-patch with such a header.

> > What exactly is the workflow you use to generate this problem?
> I use git format-patch to generate a patch file for a single-mail patch  
> (not a patch series). Then I edit this file manually to add questions and 
> comments and include my signature. During this step the umlauts came in. 
> If you have a suggestion to improve this workflow, I am all ears, I am 
> fairly new to git.

Ah, I see. I'm not sure what the best solution is there. send-email has
intentionally been kept pretty dumb, because implementing full MUA
behavior would make it pretty unwieldy. You could add an option to
send-email to add the 8-bit transfer-encoding header if necessary, but
it will have to guess at (or be configured to know) the correct encoding
of the characters.

Personally, when I want to add information like that to a patch, I pull
the output of format-patch into my MUA (mutt, in my case). I don't know
if that is a workable solution for you.

> Ok, so what about adding a flag to git-format-patch that forces the 8bit  
> headers on? I think a workaround would be to add a --subject-prefix with  
> a special character and later remove this, but this is not really a  
> long-term solution ;-)

Now that you've explained your workflow, I do think send-email is a more
appropriate place to add a header, since format-patch never even sees
the data that is causing the problem. Probably the sanest thing would be
to check each input file for non-ascii characters. If they are found,
and the message does not already have some MIME headers, then add an
8bit content-transfer-encoding and a text/plain content-type. In the
latter, you would need to specify some encoding. Most of git defaults
to utf-8, but it should probably be configurable.

We have to do a similar thing for the --compose option, so looking at
what that does is probably a good starting point.

-Peff

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Adeodato Simó @ 2009-01-09 15:59 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Linus Torvalds, Clemens Buchacher,
	Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901091405460.30769@pacific.mpi-cbg.de>

* Johannes Schindelin [Fri, 09 Jan 2009 14:07:28 +0100]:

> > If we find the "common" context lines that have only blank and 
> > punctuation letters in Dscho output, turn each of them into "-" and "+", 
> > and rearrange them so that all "-" are together followed by "+", it will 
> > match Bzr output.

> So we'd need something like this (I still think we should treat curly 
> brackets the same as punctuation, and for good measure I just handled 
> everything that is not alphanumerical the same):

Nice. With this patch of yours, --patience --collapse-non-alnums
produces the same output as bzr for this last test case (the util_sock.c
one). However, also for this last case, without --patience, diff
--collapse-non-alnums finds *no* common lines at all. Mentioning in case
you'd be interested in knowing.

Cheers,

-- 
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
- You look beaten.
- I just caught Tara laughing with another man.
- Are you sure they weren't just... kissing or something?
- No, they were laughing.
                -- Denny Crane and Alan Shore

^ permalink raw reply

* Re: [PATCH 1/2] bash completion: Add '--intent-to-add' long option for 'git add'
From: Lee Marlow @ 2009-01-09 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <20081210194131.GB11928@spearce.org>

On Wed, Dec 10, 2008 at 12:41 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Lee Marlow <lee.marlow@gmail.com> wrote:
>> Signed-off-by: Lee Marlow <lee.marlow@gmail.com>
>> ---
>>  contrib/completion/git-completion.bash |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> Trivially-Acked-by: Shawn O. Pearce <spearce@spearce.org>
>

Bump :)

^ permalink raw reply

* Re: [PATCH 2/2] bash completion: Use 'git add' completions for 'git stage'
From: Lee Marlow @ 2009-01-09 16:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <20081210195957.GE11928@spearce.org>

On Wed, Dec 10, 2008 at 12:59 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Lee Marlow <lee.marlow@gmail.com> wrote:
>> Signed-off-by: Lee Marlow <lee.marlow@gmail.com>
>> ---
>>  contrib/completion/git-completion.bash |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> Also,
>
> Trivially-Acked-by: Shawn O. Pearce <spearce@spearce.org>

Nudge

^ permalink raw reply

* Re: [PATCH, resend] git-commit: colored status when color.ui is set
From: Markus Heidelberg @ 2009-01-09 16:24 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git
In-Reply-To: <alpine.DEB.1.00.0901091155210.30769@pacific.mpi-cbg.de>

Johannes Schindelin, 09.01.2009:
> Hi,
> 
> On Fri, 9 Jan 2009, Junio C Hamano wrote:
> 
> > Markus Heidelberg <markus.heidelberg@web.de> writes:
> > > When using "git commit" and there was nothing to commit (the editor
> > > wasn't launched), the status output wasn't colored, even though color.ui
> > > was set. Only when setting color.status it worked.
> > 
> > My first reaction was:
> > 
> > 	When the editor does get launched, what would the new code do with
> > 	your patch?  Would we see bunch of escape codes in the editor now?

Of course I tested this case :)

> > But we do disable color explicitly when we generate contents to feed the
> > editor in that case since bc5d248 (builtin-commit: do not color status
> > output shown in the message template, 2007-11-18), so that fear is
> > unfounded.
> 
> I had the same reaction, so I would like to see this reasoning in the 
> commit message.

wt_status_use_color could have already been set during git_config()
without this patch, just not with color.ui, but color.status, so this is
not really a new case. But I can understand the reaction and don't have
ocjections against this explanation in the commit message.

Markus

^ permalink raw reply

* [PATCH v2] t7501-commit.sh: explicitly check that -F prevents invoking the editor
From: Adeodato Simó @ 2009-01-09 17:30 UTC (permalink / raw)
  To: git, gitster; +Cc: Johannes Schindelin, Adeodato Simó
In-Reply-To: <alpine.DEB.1.00.0812301250210.30769@pacific.mpi-cbg.de>

The "--signoff" test case in t7500-commit.sh was setting VISUAL while
using -F -, which indeed tested that the editor is not spawned with -F.
However, having it there was confusing, since there was no obvious reason
to the casual reader for it to be there.

This commits removes the setting of VISUAL from the --signoff test, and
adds in t7501-commit.sh a dedicated test case, where the rest of tests for
-F are.

Signed-off-by: Adeodato Simó <dato@net.com.org.es>
---
* Johannes Schindelin [Tue, 30 Dec 2008 13:04:46 +0100]:

> Hmm.  Obviously, I failed to document properly why I tested the editor, 
> but I think it makes sense to assume that -F still triggered an 
> interactive editor at some stage in the development of builtin commit.

> I do not have anything against separating that issue into another test 
> case, but I am strongly opposed to simply removing it.

Ok, I've moved it to a separate test case, please review to see if you
approve of it.

Thanks,

 t/t7500-commit.sh |    5 +----
 t/t7501-commit.sh |   20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/t/t7500-commit.sh b/t/t7500-commit.sh
index 6e18a96..5998baf 100755
--- a/t/t7500-commit.sh
+++ b/t/t7500-commit.sh
@@ -149,10 +149,7 @@ EOF
 
 test_expect_success '--signoff' '
 	echo "yet another content *narf*" >> foo &&
-	echo "zort" | (
-		test_set_editor "$TEST_DIRECTORY"/t7500/add-content &&
-		git commit -s -F - foo
-	) &&
+	echo "zort" | git commit -s -F - foo &&
 	git cat-file commit HEAD | sed "1,/^$/d" > output &&
 	test_cmp expect output
 '
diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 63bfc6d..b4e2b4d 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -127,6 +127,26 @@ test_expect_success \
 	"showing committed revisions" \
 	"git rev-list HEAD >current"
 
+cat >editor <<\EOF
+#!/bin/sh
+sed -e "s/good/bad/g" < "$1" > "$1-"
+mv "$1-" "$1"
+EOF
+chmod 755 editor
+
+cat >msg <<EOF
+A good commit message.
+EOF
+
+test_expect_success \
+	'editor not invoked if -F is given' '
+	 echo "moo" >file &&
+	 VISUAL=./editor git commit -a -F msg &&
+	 git show -s --pretty=format:"%s" | grep -q good &&
+	 echo "quack" >file &&
+	 echo "Another good message." | VISUAL=./editor git commit -a -F - &&
+	 git show -s --pretty=format:"%s" | grep -q good
+	 '
 # We could just check the head sha1, but checking each commit makes it
 # easier to isolate bugs.
 
-- 
1.6.1.134.g55c35

^ permalink raw reply related

* git-svn: File was not found in commit
From: Morgan Christiansson @ 2009-01-09 17:19 UTC (permalink / raw)
  To: git

Hi, i'm trying to "git svn fetch" my repository from a local file:///
repo and i'm running into this problem:

$ git svn init -t tags -b branches -T trunk file:///path/to/svn/repo
$ git svn fetch
branches/rails/rails/vendor/plugins/acts_as_xapian/.git/refs/heads/master
was not found in commit a643e882c557593f36bb9fd0966490010b9dba61 (r10576)


I found another report that seems to describe the same error:
http://marc.info/?l=git&m=121537767308135&w=2
Investigating the the history it's committed in r10577 and it's looking
for it in r10576, so it seems to be off by one revision number. Exactly
like the other report.
I've tried the latest git version of git-svn.perl and the problem is not
fixed there.


$ svn log file:///path/to/repo -r10576:10577 -v
------------------------------------------------------------------------
r10576 | morgan | 2008-11-28 14:35:53 +0000 (Fri, 28 Nov 2008) | 3 lines
Changed paths:
   A /branches/rails/rails/app/controllers/browse_sheetmusic_controller.rb
   M /branches/rails/rails/app/controllers/scores_controller.rb
   M /branches/rails/rails/app/models/composer.rb
   M /branches/rails/rails/app/models/score.rb
   M /branches/rails/rails/config/routes.rb

Commit message.

------------------------------------------------------------------------
r10577 | morgan | 2008-11-28 18:31:00 +0000 (Fri, 28 Nov 2008) | 3 lines
Changed paths:
   A /branches/rails/rails/vendor/plugins/acts_as_xapian/.git/FETCH_HEAD
   M /branches/rails/rails/vendor/plugins/acts_as_xapian/.git/config
   M /branches/rails/rails/vendor/plugins/acts_as_xapian/.git/index
   M /branches/rails/rails/vendor/plugins/acts_as_xapian/.git/logs/HEAD
   M
/branches/rails/rails/vendor/plugins/acts_as_xapian/.git/logs/refs/heads/master 

# <-- THIS FILE
   M
/branches/rails/rails/vendor/plugins/acts_as_xapian/.git/logs/refs/remotes/origin/HEAD
   M
/branches/rails/rails/vendor/plugins/acts_as_xapian/.git/logs/refs/remotes/origin/master
   A
/branches/rails/rails/vendor/plugins/acts_as_xapian/.git/objects/pack/pack-41ebdff27c581340ac7a71850e2e3a7d1cfea138.idx
   A
/branches/rails/rails/vendor/plugins/acts_as_xapian/.git/objects/pack/pack-41ebdff27c581340ac7a71850e2e3a7d1cfea138.pack
   M
/branches/rails/rails/vendor/plugins/acts_as_xapian/.git/refs/heads/master
   M
/branches/rails/rails/vendor/plugins/acts_as_xapian/.git/refs/remotes/origin/master
   A /branches/rails/rails/vendor/plugins/acts_as_xapian/README.textile
   M
/branches/rails/rails/vendor/plugins/acts_as_xapian/lib/acts_as_xapian.rb

Switched repo to git://github.com/Overbryd/acts_as_xapian.git

------------------------------------------------------------------------




I did some digging in the perl script and managed to generate this stack
trace, it shows that gs_do_update is called with $rev_a=10576 and
$rev_b=10577, the file is in $rev_b but it complains it's not found in
$rev_a.

SVN::Git::Fetcher::open_file('SVN::Git::Fetcher=HASH(0x25faf38)',
'branches/rails/rails/vendor/plugins/acts_as_xapian/.git/refs/heads/master', 

'HASH(0x25fdb00)', 10576, '_p_apr_pool_t=SCALAR(0x24f8978)') called at
/usr/lib/perl5/SVN/Ra.pm line 623
SVN::Ra::Reporter::AUTOLOAD('SVN::Ra::Reporter=ARRAY(0x24f8948)',
'SVN::Pool=REF(0x24f8528)') called at ../git-svn.perl line 4087
Git::SVN::Ra::gs_do_update('Git::SVN::Ra=HASH(0x24beac8)', 10576, 10577,
'Git::SVN=HASH(0x24f7d18)', 'SVN::Git::Fetcher=HASH(0x25faf38)') called
at ../git-svn.perl line 2481
Git::SVN::do_fetch('Git::SVN=HASH(0x24f7d18)', 'HASH(0x24c01f0)', 10577)
called at ../git-svn.perl line 4227
Git::SVN::Ra::gs_fetch_loop_common('Git::SVN::Ra=HASH(0x24beac8)',
10575, 10724, 'ARRAY(0x1da1c20)', 'ARRAY(0x1da1c50)') called at
../git-svn.perl line 1506
Git::SVN::fetch_all('svn', 'HASH(0x21d6440)') called at ../git-svn.perl
line 387
main::cmd_fetch at ../git-svn.perl line 268
eval {...} at ../git-svn.perl line 266
branches/rails/rails/vendor/plugins/acts_as_xapian/.git/refs/heads/master
was not found in commit a643e882c557593f36bb9fd0966490010b9dba61
(r10576) at ../git-svn.perl line 3271.


I'm not sure whether this is correct behavior or not and I'm not
familiar with SVN::Ra::Reporter... so some help would be appreciated.

Thanks,
Morgan

^ permalink raw reply

* Curious about details of optimization of object database...
From: chris @ 2009-01-09 17:46 UTC (permalink / raw)
  To: git

I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
tree.

Can anyone say, in a few sentences, how git avoids needing to keep multiple
slightly different copies of entire files without just storing lots of
patches/diffs?

cs

^ permalink raw reply

* Re: Curious about details of optimization of object database...
From: David Brown @ 2009-01-09 17:56 UTC (permalink / raw)
  To: chris; +Cc: git
In-Reply-To: <20090109174623.GC12552@seberino.org>

On Fri, Jan 09, 2009 at 09:46:23AM -0800, chris@seberino.org wrote:
>I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
>tree.
>
>Can anyone say, in a few sentences, how git avoids needing to keep multiple
>slightly different copies of entire files without just storing lots of
>patches/diffs?

   Documentation/technical/pack-heuristics.txt

David

^ permalink raw reply

* Re: Curious about details of optimization of object database...
From: Matthieu Moy @ 2009-01-09 17:55 UTC (permalink / raw)
  To: chris; +Cc: git
In-Reply-To: <20090109174623.GC12552@seberino.org>

chris@seberino.org writes:

> I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
> tree.

Conceptually, yes. But obviously, the storage format (pack) does what
people usually call "delta-compression", which is basically storing
only the diff against another, similar object.

-- 
Matthieu

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-09 18:09 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Adeodato Simó, Clemens Buchacher,
	Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901091405460.30769@pacific.mpi-cbg.de>



On Fri, 9 Jan 2009, Johannes Schindelin wrote:
> 
> -- snipsnap --
> [TOY PATCH] Add diff option '--collapse-non-alnums'

I really don't think it should be about "alnum".

Think about languages that use "begin" and "end" instead of "{" "}".

I think we'd be better off just looking at the size, but _this_ is a 
really good area where "uniqueness" matters.

Don't combine unique lines, but lines that have the same hash as a 
thousand other lines? Go right ahead.

		Linus

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-09 18:13 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Adeodato Simó, Clemens Buchacher,
	Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901091001260.6528@localhost.localdomain>



On Fri, 9 Jan 2009, Linus Torvalds wrote:
>
> Don't combine unique lines, but lines that have the same hash as a 
> thousand other lines? Go right ahead.

Oh, and even then, don't ever combine fragments unless it's a single line, 
and unless you can really stop with just one or two combines. Because we 
don't want to combine 100 one-liner changes (on every other line) into one 
300-line change.

		Linus

^ permalink raw reply

* Re: Curious about details of optimization of object database...
From: Nicolas Pitre @ 2009-01-09 18:34 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: chris, git
In-Reply-To: <vpqzli01hzl.fsf@bauges.imag.fr>

On Fri, 9 Jan 2009, Matthieu Moy wrote:

> chris@seberino.org writes:
> 
> > I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
> > tree.
> 
> Conceptually, yes. But obviously, the storage format (pack) does what
> people usually call "delta-compression", which is basically storing
> only the diff against another, similar object.

Also, since objects representing files and directories are named after 
their actual content, having two commits with identical files and 
directories will of course share the same blob and tree objects for 
those identical parts.


Nicolas

^ permalink raw reply

* (unknown)
From: nathan.panike @ 2009-01-09 19:02 UTC (permalink / raw)


>From 65c4fed27fe9752ffd0e3b7cb6807561a4dd4601 Mon Sep 17 00: 00:00 2001
From: Nathan W. Panike <nathan.panike@gmail.com>
Date: Fri, 9 Jan 2009 11:53:43 -0600
Subject: [PATCH] Get format-patch to show first commit after root commit

Currently, the command

git format-patch -1 e83c5163316f89bfbde

in the git repository creates an empty file.  Instead, one is
forced to do

git format-patch -1 --root e83c5163316f89bfbde

This seems arbitrary.  This patch fixes this case, so that

git format-patch -1 e83c5163316f89bfbde

will produce an actual patch.
---
 builtin-log.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 4a02ee9..5e7b61f 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -977,6 +977,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		list[nr - 1] = commit;
 	}
 	total = nr;
+	if (total == 1 && !list[0]->parents)
+		rev.show_root_diff=1;
 	if (!keep_subject && auto_number && total > 1)
 		numbered = 1;
 	if (numbered)
-- 
1.6.1.76.gc123b.dirty

^ permalink raw reply related

* [PATCH/RFC v5 0/1] git checkout: optimise away lots of lstat() calls
From: Kjetil Barvik @ 2009-01-09 19:05 UTC (permalink / raw)
  To: git; +Cc: Pete Harlan, Linus Torvalds, Junio C Hamano, Kjetil Barvik

Changes since version 4:

- After looking at the changes in create_directories() inside entry.c
  for some more time, I got the idea to just use the lstat_cache()
  instead of almost use a separate cache implementation of its own.

- To be able to implement that, I had to be able to tell the cache to
  test the full length of input 'name'.  I also had to be able to tell
  the cache to use the stat() function instead of the lstat() function
  for a given prefix length.

- Fixed a missed cache optimisation where you is not able to cache a
  symlink or a none existing directory, but still wants to cache the
  real directory part.  Fixed a second missed cache optimisation where
  the 'name' argument is a (complete) substring of the cache on a path
  component basis.

- The cache can now also return LSTAT_REG for a regular file, but
  please note that it can only cache the directory part of that file.

- Since the changes made to the create_directories() function now is
  small and simple (just had to change an if-test, remove 2 unused
  variables and update the comments), patch 2/2 is gone and included
  into patch 1/1.

- More updates to the comments in the source code.

- Thanks to Pete Harlan for spelling fixes to the commit message!

I think that the patch starts to be in a good shape now, but please
comment on it, even if it is just a rename of a variable/function name
to increase the readability or a spelling fix!

For the fun of it, I did a test with 'git chekcout-index -q --all
--prefix=<a path containing 29 components>' from the my-v2.6.27 Linux
branch, and the savings was really huge: the numbers of lstat() and
stat() calls dropped from 778 145 to 26 674, and the real time also
dropped from 52 to 32 seconds on my laptop!  :-)

-----

I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.

After some more investigation and thinking, I have made 1 patch and
been able to optimise away over 40% of all lstat() calls in some cases
for the 'git checkout' command.  Also, if you use a large path to the
'--prefix' argument to the 'git checkout-index' command, and you have
lots of files, the savings can be really huge!

The patch is against git master, and the git 'make test' test suite
still passes after the patch.  To document the improvement, below is
some numbers, which compares before and after the patch. To reproduce
the numbers:

- git clone the Linux git tree to be able to get the Linux tags
  'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25

Then, when the current branch is 'my-v2.6.25', do:

  strace -o strace_to27 -T git checkout -q my-v2.6.27

And then pretty print the 'strace_to27' file.  Below is the numbers
from the current git version (before the patch).  Notice that we do an
lstat() call on the "arch" directory over 6000 times!

TOTAL      185151 100.000% OK:165544 NOT: 19607  11.136001 sec   60 usec/call
lstat64    120954  65.327% OK:107013 NOT: 13941   5.388727 sec   45 usec/call
  strings  120954 tot  30163 uniq   4.010 /uniq   5.388727 sec   45 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   2.740520 sec   45 usec/call
  dirs      45522 tot   1436 uniq  31.701 /uniq   1.994448 sec   44 usec/call
  errors    13941 tot   5189 uniq   2.687 /uniq   0.653759 sec   47 usec/call
             6297   5.206% OK:  6297 NOT:     0  "arch"
             4544   3.757% OK:  4544 NOT:     0  "drivers"
             1816   1.501% OK:  1816 NOT:     0  "arch/arm"
             1499   1.239% OK:  1499 NOT:     0  "include"
              912   0.754% OK:   912 NOT:     0  "arch/powerpc"
              764   0.632% OK:   764 NOT:     0  "fs"
              746   0.617% OK:   746 NOT:     0  "drivers/net"
              662   0.547% OK:   662 NOT:     0  "net"
              652   0.539% OK:   325 NOT:   327  "arch/sparc/include"
              636   0.526% OK:   636 NOT:     0  "drivers/media"
              606   0.501% OK:   606 NOT:     0  "include/linux"
              533   0.441% OK:   533 NOT:     0  "arch/sh"
              522   0.432% OK:   260 NOT:   262  "arch/powerpc/include"
              488   0.403% OK:   243 NOT:   245  "arch/sh/include"
              413   0.341% OK:   413 NOT:     0  "arch/sparc"
              390   0.322% OK:   390 NOT:     0  "arch/x86"
              383   0.317% OK:   383 NOT:     0  "Documentation"
              370   0.306% OK:   184 NOT:   186  "arch/ia64/include"
              366   0.303% OK:   366 NOT:     0  "drivers/media/video"
              348   0.288% OK:   173 NOT:   175  "arch/arm/include"

Here is the numbers after applying the patch.  Notice how nice the top
20 entries list now looks!

TOTAL      134155 100.000% OK:122102 NOT: 12053  11.069389 sec   83 usec/call
lstat64     69876  52.086% OK: 63491 NOT:  6385   3.410007 sec   49 usec/call
  strings   69876 tot  30163 uniq   2.317 /uniq   3.410007 sec   49 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   3.023238 sec   49 usec/call
  dirs       2000 tot   1436 uniq   1.393 /uniq   0.085953 sec   43 usec/call
  errors     6385 tot   5189 uniq   1.230 /uniq   0.300816 sec   47 usec/call
                4   0.006% OK:     4 NOT:     0  ".gitignore"
                4   0.006% OK:     4 NOT:     0  ".mailmap"
                4   0.006% OK:     4 NOT:     0  "CREDITS"
                4   0.006% OK:     4 NOT:     0  "Documentation/00-INDEX"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-block"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-firmware-acpi"
                4   0.006% OK:     4 NOT:     0  "Documentation/CodingStyle"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-API.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-mapping.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/Makefile"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/gadget.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-api.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-locking.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs-guide.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs_example.c"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/rapidio.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/s390-drivers.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/uio-howto.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/videobook.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/writing_usb_driver.tmpl"

Comments?


Kjetil Barvik (1):
  more cache effective symlink/directory detection

 builtin-add.c          |    1 +
 builtin-apply.c        |    1 +
 builtin-update-index.c |    1 +
 cache.h                |   24 +++++++-
 diff-lib.c             |    1 +
 entry.c                |   39 +++++-------
 symlinks.c             |  158 +++++++++++++++++++++++++++++++++++-------------
 unpack-trees.c         |    6 +-
 8 files changed, 162 insertions(+), 69 deletions(-)

^ permalink raw reply

* [PATCH/RFC v5 1/1] more cache effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-09 19:05 UTC (permalink / raw)
  To: git; +Cc: Pete Harlan, Linus Torvalds, Junio C Hamano, Kjetil Barvik
In-Reply-To: <1231527954-868-1-git-send-email-barvik@broadpark.no>

Changes includes the following:

- The cache functionality is more effective.  Previously when A/B/C/D
  was in the cache and A/B/C/E/file.c was called for, there was no
  match at all from the cache.  Now we use the fact that the paths
  "A", "A/B" and "A/B/C" are already tested, and we only need to do an
  lstat() call on "A/B/C/E".

- We only cache/store the last path regardless of its type.  Since the
  cache functionality is always used with alphabetically sorted names
  (at least it seems so for me), there is no need to store both the
  last symlink-leading path and the last real-directory path.  Note
  that if the cache is not called with (mostly) alphabetically sorted
  names, neither the old, nor this new one, would be very effective.

- We also can cache the fact that a directory does not exist.
  Previously we could end up doing lots of lstat() calls for a removed
  directory which previously contained lots of files.  Since we
  already have simplified the cache functionality and only store the
  last path (see above), this new functionality was easy to add.

- Previously, when symlink A/B/C/S was cached/stored in the
  symlink-leading path, and A/B/C/file.c was called for, it was not
  easy to use the fact that we already knew that the paths "A", "A/B"
  and "A/B/C" are real directories.  Since we now only store one
  single path (the last one), we also get similar logic for free
  regarding the new "non-existing-directory-cache".

- Avoid copying the first path components of the name 2 zillion times
  when we test new path components.  Since we always cache/store the
  last path, we can copy each component as we test those directly into
  the cache.  Previously we ended up doing a memcpy() for the full
  path/name right before each lstat() call, and when updating the
  cache for each time we have tested a new path component.

- We also use less memory, that is, PATH_MAX bytes less memory on the
  stack and PATH_MAX bytes less memory on the heap.

- Introduce a 3rd argument, 'unsigned int track_flags', to the
  cache-test function, check_lstat_cache().  This new argument can be
  used to tell the cache functionality which types of directories
  should be cached.  This argument can also be used to tell the cache
  to always test the complete length of 'name' (add 'LSTAT_FULLPATH').

- Introduce a 4th argument, 'prefix_len_stat_func', which tells the
  length of a prefix, where the cache should use the stat() function
  instead of the lstat() function to test each path component.  This
  can for instance be useful at some places in the source code to
  handle the --prefix argument to the 'git checkout-index' command.

- Also introduce a 'void clear_lstat_cache(void)' function, which
  should be used to clean the cache before usage.  If for instance,
  you have changed the types of directories which should be cached,
  the cache could contain a path which was not wanted.

We also start to use the lstat_cache() function inside the
'create_directories()' function inside entry.c, and we save really
huge amounts of calls to the stat()/lstat() functions in some cases.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 719de8b... 870961e... M	builtin-add.c
:100644 100644 a8f75ed... d3d001a... M	builtin-apply.c
:100644 100644 5604977... 8907219... M	builtin-update-index.c
:100644 100644 231c06d... 0bf31d3... M	cache.h
:100644 100644 ae96c64... c9caa0e... M	diff-lib.c
:100644 100644 aa2ee46... 293400c... M	entry.c
:100644 100644 5a5e781... 314ba4f... M	symlinks.c
:100644 100644 54f301d... c3d1429... M	unpack-trees.c
 builtin-add.c          |    1 +
 builtin-apply.c        |    1 +
 builtin-update-index.c |    1 +
 cache.h                |   24 +++++++-
 diff-lib.c             |    1 +
 entry.c                |   39 +++++-------
 symlinks.c             |  158 +++++++++++++++++++++++++++++++++++-------------
 unpack-trees.c         |    6 +-
 8 files changed, 162 insertions(+), 69 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 719de8b0f2d2d831f326d948aa18700e5c474950..870961e8ca4e3d6f9333020083d0a232bccd542c 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -225,6 +225,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	argc = parse_options(argc, argv, builtin_add_options,
 			  builtin_add_usage, 0);
+	clear_symlink_cache();
 	if (patch_interactive)
 		add_interactive = 1;
 	if (add_interactive)
diff --git a/builtin-apply.c b/builtin-apply.c
index a8f75ed3ed411d8cf7a3ec9dfefef7407c50f447..d3d001a96be6e502d6338af4467f7c313370d78e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -3154,6 +3154,7 @@ int cmd_apply(int argc, const char **argv, const char *unused_prefix)
 	if (apply_default_whitespace)
 		parse_whitespace_option(apply_default_whitespace);
 
+	clear_symlink_cache();
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 		char *end;
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 560497750586ec61be4e34de6dedd9c307129817..8907219fb9cb438113e29ee17854edb5dd4baa4d 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -581,6 +581,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (entries < 0)
 		die("cache corrupted");
 
+	clear_symlink_cache();
 	for (i = 1 ; i < argc; i++) {
 		const char *path = argv[i];
 		const char *p;
diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..0bf31d3e8903a658927938f6da45c77dd289c94a 100644
--- a/cache.h
+++ b/cache.h
@@ -719,7 +719,29 @@ struct checkout {
 };
 
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
-extern int has_symlink_leading_path(int len, const char *name);
+
+#define LSTAT_REG      (1u << 0)
+#define LSTAT_DIR      (1u << 1)
+#define LSTAT_NOENT    (1u << 2)
+#define LSTAT_SYMLINK  (1u << 3)
+#define LSTAT_LSTATERR (1u << 4)
+#define LSTAT_ERR      (1u << 5)
+#define LSTAT_FULLPATH (1u << 6)
+extern unsigned int lstat_cache(int len, const char *name,
+				unsigned int track_flags, int prefix_len_stat_func);
+extern void clear_lstat_cache(void);
+static inline unsigned int has_symlink_leading_path(int len, const char *name)
+{
+	return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_DIR, -1) &
+		LSTAT_SYMLINK;
+}
+#define clear_symlink_cache() clear_lstat_cache()
+static inline unsigned int has_symlink_or_noent_leading_path(int len, const char *name)
+{
+	return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR, -1) &
+		(LSTAT_SYMLINK|LSTAT_NOENT);
+}
+#define clear_symlink_or_noent_cache() clear_lstat_cache()
 
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64ca209f4df9008198e8a04b160bed618c7..c9caa0e6ef0f4a8ee8b850869ef6d0f52b712385 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -69,6 +69,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		diff_unmerged_stage = 2;
 	entries = active_nr;
 	symcache[0] = '\0';
+	clear_symlink_cache();
 	for (i = 0; i < entries; i++) {
 		struct stat st;
 		unsigned int oldmode, newmode;
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..293400cf5be63fd66b797a68e17bf953c600fe99 100644
--- a/entry.c
+++ b/entry.c
@@ -8,35 +8,28 @@ static void create_directories(const char *path, const struct checkout *state)
 	const char *slash = path;
 
 	while ((slash = strchr(slash+1, '/')) != NULL) {
-		struct stat st;
-		int stat_status;
-
 		len = slash - path;
 		memcpy(buf, path, len);
 		buf[len] = 0;
 
-		if (len <= state->base_dir_len)
-			/*
-			 * checkout-index --prefix=<dir>; <dir> is
-			 * allowed to be a symlink to an existing
-			 * directory.
-			 */
-			stat_status = stat(buf, &st);
-		else
-			/*
-			 * if there currently is a symlink, we would
-			 * want to replace it with a real directory.
-			 */
-			stat_status = lstat(buf, &st);
-
-		if (!stat_status && S_ISDIR(st.st_mode))
+		/* For 'checkout-index --prefix=<dir>', <dir> is
+		 * allowed to be a symlink to an existing directory,
+		 * therefore we must give 'state->base_dir_len' to the
+		 * cache, such that we test path components of the
+		 * prefix with stat() instead of lstat()
+		 *
+		 * We must also tell the cache to test the complete
+		 * length of the buffer (the '|LSTAT_FULLPATH' part).
+		 */
+		if (lstat_cache(len, buf, LSTAT_DIR|LSTAT_FULLPATH,
+				state->base_dir_len) &
+		    LSTAT_DIR)
 			continue; /* ok, it is already a directory. */
 
-		/*
-		 * We know stat_status == 0 means something exists
-		 * there and this mkdir would fail, but that is an
-		 * error codepath; we do not care, as we unlink and
-		 * mkdir again in such a case.
+		/* If this mkdir() would fail, it could be that there
+		 * is already a symlink or something else exists
+		 * there, therefore we then try to unlink it and try
+		 * one more time to create the directory.
 		 */
 		if (mkdir(buf, 0777)) {
 			if (errno == EEXIST && state->force &&
diff --git a/symlinks.c b/symlinks.c
index 5a5e781a15d7d9cb60797958433eca896b31ec85..314ba4f273f9f776a130b5e5b48c5bda0ca9beed 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,64 +1,136 @@
 #include "cache.h"
 
-struct pathname {
-	int len;
-	char path[PATH_MAX];
-};
+static char         cache_path[PATH_MAX + 2];
+static int          cache_len   = 0;
+static unsigned int cache_flags = 0;
 
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
+static inline int greatest_match_lstat_cache(int len, const char *name)
 {
-	int match_len = match->len;
-	return (len > match_len &&
-		name[match_len] == '/' &&
-		!memcmp(name, match->path, match_len)) ? match_len : 0;
-}
+	int max_len, match_len = 0, i = 0;
 
-static inline void set_pathname(int len, const char *name, struct pathname *match)
-{
-	if (len < PATH_MAX) {
-		match->len = len;
-		memcpy(match->path, name, len);
-		match->path[len] = 0;
+	max_len = len < cache_len ? len : cache_len;
+	while (i < max_len && name[i] == cache_path[i]) {
+		if (name[i] == '/') match_len = i;
+		i++;
 	}
+	if (i == cache_len && len > cache_len && name[cache_len] == '/')
+		match_len = cache_len;
+	else if (i == len && len < cache_len && cache_path[len] == '/')
+		match_len = len;
+	else if (i == len && i == cache_len)
+		match_len = len;
+	return match_len;
 }
 
-int has_symlink_leading_path(int len, const char *name)
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real, or not.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is can be indicated by the 'track_flags' argument, which also
+ * can be used to indicate that we should always check the full path.
+ *
+ * The 'prefix_len_stat_func' parameter can be used to set the length
+ * of the prefix, where the cache should use the stat() function
+ * instead of the lstat() function to test each path component.
+ */
+unsigned int lstat_cache(int len, const char *name,
+			 unsigned int track_flags, int prefix_len_stat_func)
 {
-	static struct pathname link, nonlink;
-	char path[PATH_MAX];
+	int match_len, last_slash, last_slash_dir, max_len, ret;
+	unsigned int match_flags, ret_flags, save_flags;
 	struct stat st;
-	char *sp;
-	int known_dir;
 
-	/*
-	 * See if the last known symlink cache matches.
+	/* Check if match from the cache for 2 "excluding" path types.
+	 */
+	match_len = last_slash = greatest_match_lstat_cache(len, name);
+	match_flags = cache_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+	if (match_flags && match_len == cache_len)
+		return match_flags;
+
+	/* If 'name' is a substring of the cache on a path component
+	 * basis, and a directory is cached, we return immediately.
 	 */
-	if (match_pathname(len, name, &link))
-		return 1;
+	match_flags = cache_flags & track_flags & LSTAT_DIR;
+	if (match_flags && match_len == len)
+		return match_flags;
 
-	/*
-	 * Get rid of the last known directory part
+	/* Okay, no match from the cache so far, so now we have to
+	 * check the rest of the path components.
 	 */
-	known_dir = match_pathname(len, name, &nonlink);
+	ret_flags = LSTAT_DIR;
+	last_slash_dir = last_slash;
+	max_len = len < PATH_MAX ? len : PATH_MAX;
+	while (match_len < max_len) {
+		do {
+			cache_path[match_len] = name[match_len];
+			match_len++;
+		} while (match_len < max_len && name[match_len] != '/');
+		if (match_len >= max_len && !(track_flags & LSTAT_FULLPATH))
+			break;
+		last_slash = match_len;
+		cache_path[last_slash] = '\0';
 
-	while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
-		int thislen = sp - name ;
-		memcpy(path, name, thislen);
-		path[thislen] = 0;
+		if (last_slash <= prefix_len_stat_func)
+			ret = stat(cache_path, &st);
+		else
+			ret = lstat(cache_path, &st);
 
-		if (lstat(path, &st))
-			return 0;
-		if (S_ISDIR(st.st_mode)) {
-			set_pathname(thislen, path, &nonlink);
-			known_dir = thislen;
+		if (ret) {
+			ret_flags = LSTAT_LSTATERR;
+			if (errno == ENOENT)
+				ret_flags |= LSTAT_NOENT;
+		} else if (S_ISDIR(st.st_mode)) {
+			last_slash_dir = last_slash;
 			continue;
-		}
-		if (S_ISLNK(st.st_mode)) {
-			set_pathname(thislen, path, &link);
-			return 1;
+		} else if (S_ISLNK(st.st_mode)) {
+			ret_flags = LSTAT_SYMLINK;
+		} else if (S_ISREG(st.st_mode)) {
+			ret_flags = LSTAT_REG;
+		} else {
+			ret_flags = LSTAT_ERR;
 		}
 		break;
 	}
-	return 0;
+
+	/* At the end update the cache.  Note that max 3 different
+	 * path types, LSTAT_NOENT, LSTAT_SYMLINK and LSTAT_DIR, can
+	 * be cached for the moment!
+	 */
+	save_flags = ret_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+	if (save_flags && last_slash > 0 && last_slash <= PATH_MAX) {
+		cache_path[last_slash] = '\0';
+		cache_len   = last_slash;
+		cache_flags = save_flags;
+	} else if (track_flags & LSTAT_DIR &&
+		   last_slash_dir > 0 && last_slash_dir <= PATH_MAX) {
+		/* We have separate test for the directory case, since
+		 * it could be that we have found a symlink or none
+		 * existing directory, and the track_flags says that
+		 * we can not cache this fact, and the cache would
+		 * then have been empty in this case.
+		 *
+		 * But, if we is allowed to track real directories, we
+		 * can still cache the path components before the last
+		 * one (the found symlink or none existing component).
+		 */
+		cache_path[last_slash_dir] = '\0';
+		cache_len   = last_slash_dir;
+		cache_flags = LSTAT_DIR;
+	} else {
+		clear_lstat_cache();
+	}
+	return ret_flags;
+}
+
+/*
+ * Before usage of the check_lstat_cache() function one should call
+ * clear_lstat_cache() (at an appropriate place) to make sure that the
+ * cache is clean.
+ */
+void clear_lstat_cache(void)
+{
+	cache_path[0] = '\0';
+	cache_len     = 0;
+	cache_flags   = 0;
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..c3d14294aaaefb9e40c99f174b4f5f41e5fad635 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,7 @@ static void unlink_entry(struct cache_entry *ce)
 	char *cp, *prev;
 	char *name = ce->name;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return;
 	if (unlink(name))
 		return;
@@ -105,6 +105,7 @@ static int check_updates(struct unpack_trees_options *o)
 		cnt = 0;
 	}
 
+	clear_symlink_or_noent_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -118,6 +119,7 @@ static int check_updates(struct unpack_trees_options *o)
 		}
 	}
 
+	clear_lstat_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -584,7 +586,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	if (o->index_only || o->reset || !o->update)
 		return 0;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return 0;
 
 	if (!lstat(ce->name, &st)) {
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* Re: Curious about details of optimization of object database...
From: Boyd Stephen Smith Jr. @ 2009-01-09 19:07 UTC (permalink / raw)
  To: chris; +Cc: git
In-Reply-To: <20090109174623.GC12552@seberino.org>

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On Friday 2009 January 09 11:46:23 chris@seberino.org wrote:
>I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
>tree.

It's even more than that.  A commit object contains its message, the SHA of 
the tree, and zero or more SHAs for its parents.

>Can anyone say, in a few sentences, how git avoids needing to keep multiple
>slightly different copies of entire files without just storing lots of
>patches/diffs?

Loose objects can have large swaths of duplicated data.  However, git also 
supports storing objects in a packed format, which uses delta compression to 
reduce the duplication to close to nothing.

Some examples:
Sizes are from "du -sh .git ."; The .git directory stores all the objects as 
well as the repository configuration, refs, reflogs, etc.  The . directory 
has .git and a clean checkout of master.

The LinuxPMI (http://linuxpmi.org/) tree:
41M     .git
83M     .
(So, the storage is actually a bit smaller than the checkout; 984 objects; 140 
commits)

A small project between me an my flatmates:
309K    .git
3.6M    .
(Here, the storage is significantly smaller than the checkout; 786 objects; 
155 commits)

My repository that tracks my dotfiles:
124K    .git
176K    .
(113 objects; 28 commits)
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH] filter-branch: add git_commit_non_empty_tree and --prune-empty.
From: Jay Soffian @ 2009-01-09 19:29 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Junio C Hamano, git, pasky, srabbelier
In-Reply-To: <20081103151826.GJ13930@artemis.corp>

On Mon, Nov 3, 2008 at 10:18 AM, Pierre Habouzit <madcoder@debian.org> wrote:
> On Mon, Nov 03, 2008 at 09:27:29AM +0000, Pierre Habouzit wrote:
>> On Mon, Nov 03, 2008 at 04:58:44AM +0000, Junio C Hamano wrote:

Bump, http://thread.gmane.org/gmane.comp.version-control.git/99440/

(I'd like to see this included. Having a bunch of empty commits after
using filter-branch to remove unwanted files from history is, er,
sub-optimal, so seems like it might even be default behavior?)

j.

^ permalink raw reply

* [PATCH] Get format-patch to show first commit after root commit
From: Nathan W. Panike @ 2009-01-09 19:35 UTC (permalink / raw)
  To: git; +Cc: Nathan W. Panike

Currently, the command

git format-patch -1 e83c5163316f89bfbde

in the git repository creates an empty file.  Instead, one is
forced to do

git format-patch -1 --root e83c5163316f89bfbde

This seems arbitrary.  This patch fixes this case, so that

git format-patch -1 e83c5163316f89bfbde

will produce an actual patch.
---
 builtin-log.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 4a02ee9..5e7b61f 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -977,6 +977,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		list[nr - 1] = commit;
 	}
 	total = nr;
+	if (total == 1 && !list[0]->parents)
+		rev.show_root_diff=1;
 	if (!keep_subject && auto_number && total > 1)
 		numbered = 1;
 	if (numbered)
-- 
1.6.1.76.gc123b.dirty

^ permalink raw reply related

* Assertion Failed when importing svn into Git
From: 4jxDQ6FQee2H @ 2009-01-09 19:32 UTC (permalink / raw)
  To: git

Hi,


I'm trying to import my SVN repository into GIT and its throwing an
Assertion Failed error message in text_delta.c  Can someone help me?

The assertion failed error occurs when running 'svn diff -r 288:289'
which contains only one change - a change made to a binary file. That
binary file is the Eclipse dataModel file, both revs (288, 289) have
the file marked as application/octet-stream using the svn:mime-type
property


subversion - 1.5.4
git        - 1.6.0.6

The SVN repository was started using 1.4.6, if that matters

Here's the exact error message:

-------
$ svn diff -r 288:289
svn: subversion/libsvn_delta/text_delta.c:609: apply_window: Assertion
`window->sview_len == 0 || (window->sview_offset >= ab->sbuf_offset &&
(window->sview_offset + window->sview_len >= ab->sbuf_offset +
ab->sbuf_len))' failed.
Aborted
-------


Thanks!

^ permalink raw reply

* Re: Assertion Failed when importing svn into Git
From: 4jxDQ6FQee2H @ 2009-01-09 19:43 UTC (permalink / raw)
  To: 4jxDQ6FQee2H; +Cc: git
In-Reply-To: <20090109133234.345dacb9@family.dyweni.com>

Hi,

For more testing, I setup a brand new SVN repository (using v1.5.4).

In the old SVN Repo, I checked out the problematic file
(.cache/.dataModel) from rev 288 and rev 289.


In the new SVN Repo, I added the problematic file (from rev 288) and
committed, which became Rev 4.

Then I copied the problematic file (rev rev 289) ontop of that file and
commited, which became Rev 5.

I'm successfully able to do a 'svn diff -r 4:5'.

Does this imply that I have corruption in my old SVN Repository?

Or is something being lost when checking out the file from Rev 289 that
does break SVN in my test?





> Hi,
> 
> 
> I'm trying to import my SVN repository into GIT and its throwing an
> Assertion Failed error message in text_delta.c  Can someone help me?
> 
> The assertion failed error occurs when running 'svn diff -r 288:289'
> which contains only one change - a change made to a binary file. That
> binary file is the Eclipse dataModel file, both revs (288, 289) have
> the file marked as application/octet-stream using the svn:mime-type
> property
> 
> 
> subversion - 1.5.4
> git        - 1.6.0.6
> 
> The SVN repository was started using 1.4.6, if that matters
> 
> Here's the exact error message:
> 
> -------
> $ svn diff -r 288:289
> svn: subversion/libsvn_delta/text_delta.c:609: apply_window: Assertion
> `window->sview_len == 0 || (window->sview_offset >= ab->sbuf_offset &&
> (window->sview_offset + window->sview_len >= ab->sbuf_offset +
> ab->sbuf_len))' failed.
> Aborted
> -------
> 
> 
> Thanks!

^ permalink raw reply

* Re: Assertion Failed when importing svn into Git
From: 4jxDQ6FQee2H @ 2009-01-09 20:05 UTC (permalink / raw)
  To: 4jxDQ6FQee2H; +Cc: git
In-Reply-To: <20090109134307.435ed56f@family.dyweni.com>

Hi,

I have resolved this issue.

There was either corruption in the repository or an incompatibility
between the 1.4.6 and the 1.5.4 formats.

A dump / reload of the repository has resolved the issue.

I can run 'svn diff -r 288:289' successfully on the new repository.

Thanks!



> Hi,
> 
> For more testing, I setup a brand new SVN repository (using v1.5.4).
> 
> In the old SVN Repo, I checked out the problematic file
> (.cache/.dataModel) from rev 288 and rev 289.
> 
> 
> In the new SVN Repo, I added the problematic file (from rev 288) and
> committed, which became Rev 4.
> 
> Then I copied the problematic file (rev rev 289) ontop of that file
> and commited, which became Rev 5.
> 
> I'm successfully able to do a 'svn diff -r 4:5'.
> 
> Does this imply that I have corruption in my old SVN Repository?
> 
> Or is something being lost when checking out the file from Rev 289
> that does break SVN in my test?
> 
> 
> 
> 
> 
> > Hi,
> > 
> > 
> > I'm trying to import my SVN repository into GIT and its throwing an
> > Assertion Failed error message in text_delta.c  Can someone help me?
> > 
> > The assertion failed error occurs when running 'svn diff -r 288:289'
> > which contains only one change - a change made to a binary file.
> > That binary file is the Eclipse dataModel file, both revs (288,
> > 289) have the file marked as application/octet-stream using the
> > svn:mime-type property
> > 
> > 
> > subversion - 1.5.4
> > git        - 1.6.0.6
> > 
> > The SVN repository was started using 1.4.6, if that matters
> > 
> > Here's the exact error message:
> > 
> > -------
> > $ svn diff -r 288:289
> > svn: subversion/libsvn_delta/text_delta.c:609: apply_window:
> > Assertion `window->sview_len == 0 || (window->sview_offset >=
> > ab->sbuf_offset && (window->sview_offset + window->sview_len >=
> > ab->sbuf_offset + ab->sbuf_len))' failed.
> > Aborted
> > -------
> > 
> > 
> > Thanks!  

^ permalink raw reply

* Re: fetch branch blacklist
From: Jakub Narebski @ 2009-01-09 20:23 UTC (permalink / raw)
  To: jidanni; +Cc: git
In-Reply-To: <87r63ek6cw.fsf@jidanni.org>

jidanni@jidanni.org writes:

> If one wants to always fetch all except one remote branch, one cannot
> just blacklist it, but must instead whitelist all the rest.
> $ git branch -rd origin/man origin/html
> Deleted remote branch origin/man.
> Deleted remote branch origin/html.
> Plus I edited them out of FETCH_HEAD. Nonetheless, back from the dead:
> $ git pull
> From git://git.kernel.org/pub/scm/git/git
>  * [new branch]      html       -> origin/html
>  * [new branch]      man        -> origin/man
> The only solution is to change .git/config:
> [remote "origin"]
> 	url = git://git.kernel.org/pub/scm/git/git.git
> #	fetch = +refs/heads/*:refs/remotes/origin/*
> 	fetch = +refs/heads/maint:refs/remotes/origin/maint
> 	fetch = +refs/heads/master:refs/remotes/origin/master
> 	fetch = +refs/heads/next:refs/remotes/origin/next
> 	fetch = +refs/heads/pu:refs/remotes/origin/pu
> 	fetch = +refs/heads/todo:refs/remotes/origin/todo

Well, you can always use hooks for that (see for example
underdocumented contrib/hooks/update-paranoid)... or you can try to
scratch that itch yourself.  gitignore supports inverse (negated)
patterns (!<pattern>), so there is some code dealing with
"blacklisting".  I would propose using the same '!' character, or
perhaps one of forbidden characters (see git-check-ref-format(1)),
i.e.

 [remote "origin"]
 	url = git://git.kernel.org/pub/scm/git/git.git
 	fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = !refs/heads/html
        fetch = !refs/heads/man

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] Get format-patch to show first commit after root commit
From: Alexander Potashev @ 2009-01-09 20:29 UTC (permalink / raw)
  To: Nathan W. Panike; +Cc: git
In-Reply-To: <1231529725-19767-1-git-send-email-nathan.panike@gmail.com>

Hello!

I experienced this problem today while preparing a simple patch for
reply in "[PATCH 2/2] Use is_pseudo_dir_name everywhere" thread.
I used a workaround: add a file, commit, remove it, commit, add it once
again, commit and after all format-patch.

On 13:35 Fri 09 Jan     , Nathan W. Panike wrote:
> Currently, the command
> 
> git format-patch -1 e83c5163316f89bfbde
> 
> in the git repository creates an empty file.  Instead, one is
> forced to do
> 
> git format-patch -1 --root e83c5163316f89bfbde
> 
> This seems arbitrary.  This patch fixes this case, so that
> 
> git format-patch -1 e83c5163316f89bfbde

Your patch doesn't solve the problem if there are more than one commit
(say, 2 commits) and you run 'git format-patch -2'. Even with your patch
format-patch writes an empty patch file corresponding to the root commit
(actually, it creates 2 patches, but the first is empty).

Please, correct me if I'm wrong.

> 
> will produce an actual patch.
> ---
>  builtin-log.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/builtin-log.c b/builtin-log.c
> index 4a02ee9..5e7b61f 100644
> --- a/builtin-log.c
> +++ b/builtin-log.c
> @@ -977,6 +977,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
>  		list[nr - 1] = commit;
>  	}
>  	total = nr;
> +	if (total == 1 && !list[0]->parents)
> +		rev.show_root_diff=1;
>  	if (!keep_subject && auto_number && total > 1)
>  		numbered = 1;
>  	if (numbered)
> -- 
> 1.6.1.76.gc123b.dirty

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox