Git development

Git development
 help / color / mirror / Atom feed

* [PATCH] doc: improve grammar in git-update-index
From: Anthony Sottile @ 2018-12-14 21:25 UTC (permalink / raw)
  To: git; +Cc: Anthony Sottile

Signed-off-by: Anthony Sottile <asottile@umich.edu>
---
 Documentation/git-update-index.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 1c4d146a4..9c03ca167 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -326,7 +326,7 @@ inefficient `lstat(2)`.  If your filesystem is one of them, you
 can set "assume unchanged" bit to paths you have not changed to
 cause Git not to do this check.  Note that setting this bit on a
 path does not mean Git will check the contents of the file to
-see if it has changed -- it makes Git to omit any checking and
+see if it has changed -- it means Git will skip any checking and
 assume it has *not* changed.  When you make changes to working
 tree files, you have to explicitly tell Git about it by dropping
 "assume unchanged" bit, either before or after you modify them.
-- 
2.17.1


^ permalink raw reply related

* Re: Bug in lineendings handling that prevents resetting checking out, rebasing etc
From: Mr&Mrs D @ 2018-12-14 21:32 UTC (permalink / raw)
  To: john.a.passaro; +Cc: git
In-Reply-To: <CAJdN7KitOpH=WFJW2SgU8mt75pFzF2mhD0TrCkyfnYugRdTkxw@mail.gmail.com>

$ git --version
git version 2.19.2

Mac os mojave

Hmm the latest version here: https://git-scm.com/download/mac seems to
be this one - where do I get 2.20?

Thanks!
On Fri, Dec 14, 2018 at 4:22 PM John Passaro <john.a.passaro@gmail.com> wrote:
>
> On Fri, Dec 14, 2018 at 4:08 PM Mr&Mrs D <the.ubik@gmail.com> wrote:
> >
> > Hi all,
> >
> > I maintain a python project you can clone from:
> >
> > git@github.com:wrye-bash/wrye-bash.git
> >
> > For reasons unknown git sees a particular file as changed
> > (Mopy/Docs/Bash Readme Template.html, sometimes others too). This file
> > was probably committed to the svn repository this git repo was created
> > from with CRLF line endings. When we moved to git we added a
> > gitattributes file (
> > https://github.com/wrye-bash/wrye-bash/blob/dev/.gitattributes ) and
> > this file was edited to explicitly state htms are text - all to no
> > avail. From time to time - on windows - as in when checking out an old
> > commit - git would see that file as changed. The only workaround that
> > worked for me was
> >
> >     git rm -r . --cached -q && git reset --hard
> >
> > For more details and discussion see this SO question I posted almost
> > five years ago:
> >
> > https://stackoverflow.com/questions/21122094/git-line-endings-cant-stash-reset-and-now-cant-rebase-over-spurious-line-en
> >
> > I used to work in windows and the bug was tolerable as there was that
> > workaround. Now I moved to mac and no workaround works anymore - we
> > have a special page on our wiki  with workarounds for this one btw:
> >
> > https://github.com/wrye-bash/wrye-bash/wiki/%5Bgit%5D-Issues-with-line-endings-preventing-checking,-merge,-etc
> >
> > Well after 5 years and countless hours trying to solve this I reach
> > out to you guys and girls - _this is a full-time bug in git line
> > endings handling_. When someone issues a git reset --hard this should
> > work no matter what - well it does not. So this bug may be really a
> > can of worms.
> >
> > Please someone clone this repo on linux or mac - probably just cloning
> > will have the files appear as changed (by the way hitting refresh on
> > git gui I have different sets of files appear as changed). If not then
> >
> > git checkout utumno-wip
> > git rebase -i dev
> >
> > and then select a commit to edit should be enough to trigger this bug
>
> Does not reproduce on git 2.20.0 (mac high sierra fwiw). What version of git
> are you using?
> >
> > Needless to say I am  well aware of things like `git add --renormalize
> > .` - but renormalizing is not the issue. The issue is that _files show
> > as changed and even a git reset --hard won't convince git that
> > nothing's changed_.
> >
> > $ git reset --hard
> > HEAD is now at e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
> > $ git status
> > interactive rebase in progress; onto 02ae6f26
> > Last commands done (4 commands done):
> >    pick 3a39a0c0 Monkey patch for undecodable inis:
> >    pick e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
> >   (see more in file .git/rebase-merge/done)
> > Next commands to do (19 remaining commands):
> >    edit a3a7b237 Amend last commit and linefixes:  ΕΕΕΕ
> >    edit 432fd314 fFF handle empty or malformed inis
> >   (use "git rebase --edit-todo" to view and edit)
> > You are currently editing a commit while rebasing branch 'utumno-wip'
> > on '02ae6f26'.
> >   (use "git commit --amend" to amend the current commit)
> >   (use "git rebase --continue" once you are satisfied with your changes)
> >
> > Changes not staged for commit:
> >   (use "git add <file>..." to update what will be committed)
> >   (use "git checkout -- <file>..." to discard changes in working directory)
> >
> > modified:   Mopy/Docs/Bash Readme Template.html
> >
> > Untracked files:
> >   (use "git add <file>..." to include in what will be committed)
> >
> > .DS_Store
> > .idea.7z
> >
> > no changes added to commit (use "git add" and/or "git commit -a")
> > $
> >
> > I really hope someone here can debug this
> > Thanks!

^ permalink raw reply

* Re: Bug in lineendings handling that prevents resetting checking out, rebasing etc
From: Torsten Bögershausen @ 2018-12-14 21:32 UTC (permalink / raw)
  To: Mr&Mrs D; +Cc: git
In-Reply-To: <CABRG_PEy9H7za9cTdXMvFB37GfDvpBvsDDoLZ5-Bpm=9NWzLiw@mail.gmail.com>

On Fri, Dec 14, 2018 at 04:04:15PM -0500, Mr&Mrs D wrote:
> Hi all,
> 
> I maintain a python project you can clone from:
> 
> git@github.com:wrye-bash/wrye-bash.git
> 
> For reasons unknown git sees a particular file as changed
> (Mopy/Docs/Bash Readme Template.html, sometimes others too). This file
> was probably committed to the svn repository this git repo was created
> from with CRLF line endings. When we moved to git we added a
> gitattributes file (
> https://github.com/wrye-bash/wrye-bash/blob/dev/.gitattributes ) and
> this file was edited to explicitly state htms are text - all to no
> avail. From time to time - on windows - as in when checking out an old
> commit - git would see that file as changed. The only workaround that
> worked for me was
> 
>     git rm -r . --cached -q && git reset --hard
> 
> For more details and discussion see this SO question I posted almost
> five years ago:
> 
> https://stackoverflow.com/questions/21122094/git-line-endings-cant-stash-reset-and-now-cant-rebase-over-spurious-line-en
> 
> I used to work in windows and the bug was tolerable as there was that
> workaround. Now I moved to mac and no workaround works anymore - we
> have a special page on our wiki  with workarounds for this one btw:
> 
> https://github.com/wrye-bash/wrye-bash/wiki/%5Bgit%5D-Issues-with-line-endings-preventing-checking,-merge,-etc
> 
> Well after 5 years and countless hours trying to solve this I reach
> out to you guys and girls - _this is a full-time bug in git line
> endings handling_. When someone issues a git reset --hard this should
> work no matter what - well it does not. So this bug may be really a
> can of worms.
> 
> Please someone clone this repo on linux or mac - probably just cloning
> will have the files appear as changed (by the way hitting refresh on
> git gui I have different sets of files appear as changed). If not then
> 
> git checkout utumno-wip
Thet commit is -excuse me if that sounds too harsh- is messed up.
git status says
modified:   Mopy/Docs/Bash Readme Template.html

And if I dig into the EOL stuff, I run
git ls-files --eol | grep  Readme | less

And find a contradiction here:
i/crlf  w/crlf  attr/text               Mopy/Docs/Bash Readme Template.html

The attributes say "text" and the file has CRLF "in the repo",
(techically speaking in the index) and that is an "illegal" condition
in the repo, and not a bug in Git.
I didn't try the rebase as such, sice the first problem needs
to be fixed, before we try to move on.

So, the old commits are problematic/illegal and they are as they are.
Such a commit can not be fixed, whenever somebody checks it out,
there will be a problem (or two, or none, depending on the timing,
the file system...)

We can not fix commits like b1acc012878c9fdd8b4ad610ce7eae0dcbcbcab0.
We can make new commits, and fix them.

We can fix one branch, and other branches, and merge them together.
But rebase seems to be problamatic, at least to me.
What exactly do you want to do?

Can we agree to do a merge of 2 branches?
Then I can possibly help you out.





> git rebase -i dev
> 
> and then select a commit to edit should be enough to trigger this bug
> 
> Needless to say I am  well aware of things like `git add --renormalize
> .` - but renormalizing is not the issue. The issue is that _files show
> as changed and even a git reset --hard won't convince git that
> nothing's changed_.
> 
> $ git reset --hard
> HEAD is now at e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
> $ git status
> interactive rebase in progress; onto 02ae6f26
> Last commands done (4 commands done):
>    pick 3a39a0c0 Monkey patch for undecodable inis:
>    pick e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
>   (see more in file .git/rebase-merge/done)
> Next commands to do (19 remaining commands):
>    edit a3a7b237 Amend last commit and linefixes:  ΕΕΕΕ
>    edit 432fd314 fFF handle empty or malformed inis
>   (use "git rebase --edit-todo" to view and edit)
> You are currently editing a commit while rebasing branch 'utumno-wip'
> on '02ae6f26'.
>   (use "git commit --amend" to amend the current commit)
>   (use "git rebase --continue" once you are satisfied with your changes)
> 
> Changes not staged for commit:
>   (use "git add <file>..." to update what will be committed)
>   (use "git checkout -- <file>..." to discard changes in working directory)
> 
> modified:   Mopy/Docs/Bash Readme Template.html
> 
> Untracked files:
>   (use "git add <file>..." to include in what will be committed)
> 
> .DS_Store
> .idea.7z
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> $
> 
> I really hope someone here can debug this
> Thanks!

^ permalink raw reply

* Re: Git blame performance on files with a lot of history
From: Derrick Stolee @ 2018-12-14 21:31 UTC (permalink / raw)
  To: Clement Moyroud, git
In-Reply-To: <CABXAcUzoNJ6s3=2xZfWYQUZ_AUefwP=5UVUgMnafKHHtufzbSA@mail.gmail.com>

On 12/14/2018 1:29 PM, Clement Moyroud wrote:
> My group at work is migrating a CVS repo to Git. The biggest issue we
> face so far is the performance of git blame, especially compared to
> CVS on the same file. One file especially causes us trouble: it's a
> 30k lines file with 25 years of history in 3k+ commits. The complete
> repo has 200k+ commits over that same period of time.

I think the 30k lines is the bigger issue than the 200k+ commits. I'm 
not terribly familiar with the blame code, though.

> Currently, 'cvs annotate' takes 2.7 seconds, while 'git blame'
> (without -M nor -C) takes 145s.
>
> I tried using the commit-graph with the Bloom filter, per
> https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/.

Thanks for the interest in this prototype feature. Sorry that it doesn't 
appear to help you in this case. It should definitely be a follow-up 
when that feature gets polished to production-quality.
> Looking at the blame code, it does not seem to be able to use the
> commit graph, so I tried the same rev-list command from the e-mail,
> using my own file:
>      > GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y
> /path/to/git rev-list --count --full-history HEAD -- important/file.C
>      3576
>
Please double-check that you have the 'core.commitGraph' config setting 
enabled, or you will not read the commit-graph at run-time:

     git config core.commitGraph true

I see that the commit introducing GIT_TRACE_BLOOM_FILTER [1] does 
nothing if the commit-graph is not loaded.

Thanks,
-Stolee

[1] 
https://github.com/derrickstolee/git/commit/adc469894b755512c9d02f099700ead2a7a78377

^ permalink raw reply

* Re: Bug in lineendings handling that prevents resetting checking out, rebasing etc
From: John Passaro @ 2018-12-14 21:21 UTC (permalink / raw)
  To: the.ubik; +Cc: git
In-Reply-To: <CABRG_PEy9H7za9cTdXMvFB37GfDvpBvsDDoLZ5-Bpm=9NWzLiw@mail.gmail.com>

On Fri, Dec 14, 2018 at 4:08 PM Mr&Mrs D <the.ubik@gmail.com> wrote:
>
> Hi all,
>
> I maintain a python project you can clone from:
>
> git@github.com:wrye-bash/wrye-bash.git
>
> For reasons unknown git sees a particular file as changed
> (Mopy/Docs/Bash Readme Template.html, sometimes others too). This file
> was probably committed to the svn repository this git repo was created
> from with CRLF line endings. When we moved to git we added a
> gitattributes file (
> https://github.com/wrye-bash/wrye-bash/blob/dev/.gitattributes ) and
> this file was edited to explicitly state htms are text - all to no
> avail. From time to time - on windows - as in when checking out an old
> commit - git would see that file as changed. The only workaround that
> worked for me was
>
>     git rm -r . --cached -q && git reset --hard
>
> For more details and discussion see this SO question I posted almost
> five years ago:
>
> https://stackoverflow.com/questions/21122094/git-line-endings-cant-stash-reset-and-now-cant-rebase-over-spurious-line-en
>
> I used to work in windows and the bug was tolerable as there was that
> workaround. Now I moved to mac and no workaround works anymore - we
> have a special page on our wiki  with workarounds for this one btw:
>
> https://github.com/wrye-bash/wrye-bash/wiki/%5Bgit%5D-Issues-with-line-endings-preventing-checking,-merge,-etc
>
> Well after 5 years and countless hours trying to solve this I reach
> out to you guys and girls - _this is a full-time bug in git line
> endings handling_. When someone issues a git reset --hard this should
> work no matter what - well it does not. So this bug may be really a
> can of worms.
>
> Please someone clone this repo on linux or mac - probably just cloning
> will have the files appear as changed (by the way hitting refresh on
> git gui I have different sets of files appear as changed). If not then
>
> git checkout utumno-wip
> git rebase -i dev
>
> and then select a commit to edit should be enough to trigger this bug

Does not reproduce on git 2.20.0 (mac high sierra fwiw). What version of git
are you using?
>
> Needless to say I am  well aware of things like `git add --renormalize
> .` - but renormalizing is not the issue. The issue is that _files show
> as changed and even a git reset --hard won't convince git that
> nothing's changed_.
>
> $ git reset --hard
> HEAD is now at e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
> $ git status
> interactive rebase in progress; onto 02ae6f26
> Last commands done (4 commands done):
>    pick 3a39a0c0 Monkey patch for undecodable inis:
>    pick e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
>   (see more in file .git/rebase-merge/done)
> Next commands to do (19 remaining commands):
>    edit a3a7b237 Amend last commit and linefixes:  ΕΕΕΕ
>    edit 432fd314 fFF handle empty or malformed inis
>   (use "git rebase --edit-todo" to view and edit)
> You are currently editing a commit while rebasing branch 'utumno-wip'
> on '02ae6f26'.
>   (use "git commit --amend" to amend the current commit)
>   (use "git rebase --continue" once you are satisfied with your changes)
>
> Changes not staged for commit:
>   (use "git add <file>..." to update what will be committed)
>   (use "git checkout -- <file>..." to discard changes in working directory)
>
> modified:   Mopy/Docs/Bash Readme Template.html
>
> Untracked files:
>   (use "git add <file>..." to include in what will be committed)
>
> .DS_Store
> .idea.7z
>
> no changes added to commit (use "git add" and/or "git commit -a")
> $
>
> I really hope someone here can debug this
> Thanks!

^ permalink raw reply

* [PATCH v4 6/6] pack-objects: create GIT_TEST_PACK_SPARSE
From: Derrick Stolee via GitGitGadget @ 2018-12-14 21:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v4.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Create a test variable GIT_TEST_PACK_SPARSE to enable the sparse
object walk algorithm by default during the test suite. Enabling
this variable ensures coverage in many interesting cases, such as
shallow clones, partial clones, and missing objects.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/pack-objects.c         | 1 +
 t/README                       | 4 ++++
 t/t5322-pack-objects-sparse.sh | 6 +++---
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 124b1bafc4..507d381153 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3331,6 +3331,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
+	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", 0);
 	reset_pack_idx_option(&pack_idx_opts);
 	git_config(git_pack_config, NULL);
 
diff --git a/t/README b/t/README
index 28711cc508..8b6dfe1864 100644
--- a/t/README
+++ b/t/README
@@ -342,6 +342,10 @@ GIT_TEST_INDEX_VERSION=<n> exercises the index read/write code path
 for the index version specified.  Can be set to any valid version
 (currently 2, 3, or 4).
 
+GIT_TEST_PACK_SPARSE=<boolean> if enabled will default the pack-objects
+builtin to use the sparse object walk. This can still be overridden by
+the --no-sparse command-line argument.
+
 GIT_TEST_PRELOAD_INDEX=<boolean> exercises the preload-index code path
 by overriding the minimum number of cache entries required per thread.
 
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 8f5699bd91..e8cf41d1c6 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -36,7 +36,7 @@ test_expect_success 'setup repo' '
 '
 
 test_expect_success 'non-sparse pack-objects' '
-	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
 	test_cmp expect_objects.txt nonsparse_objects.txt
@@ -70,7 +70,7 @@ test_expect_success 'duplicate a folder from f3 and commit to topic1' '
 '
 
 test_expect_success 'non-sparse pack-objects' '
-	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
 	test_cmp expect_objects.txt nonsparse_objects.txt
@@ -102,7 +102,7 @@ test_expect_success 'non-sparse pack-objects' '
 		topic1			\
 		topic1^{tree}		\
 		topic1:f3 | sort >expect_objects.txt &&
-	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
 	test_cmp expect_objects.txt nonsparse_objects.txt
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v4 5/6] pack-objects: create pack.useSparse setting
From: Derrick Stolee via GitGitGadget @ 2018-12-14 21:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v4.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The '--sparse' flag in 'git pack-objects' changes the algorithm
used to enumerate objects to one that is faster for individual
users pushing new objects that change only a small cone of the
working directory. The sparse algorithm is not recommended for a
server, which likely sends new objects that appear across the
entire working directory.

Create a 'pack.useSparse' setting that enables this new algorithm.
This allows 'git push' to use this algorithm without passing a
'--sparse' flag all the way through four levels of run_command()
calls.

If the '--no-sparse' flag is set, then this config setting is
overridden.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/pack.txt  |  9 +++++++++
 builtin/pack-objects.c         |  4 ++++
 t/t5322-pack-objects-sparse.sh | 15 +++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index edac75c83f..425c73aa52 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -105,6 +105,15 @@ pack.useBitmaps::
 	true. You should not generally need to turn this off unless
 	you are debugging pack bitmaps.
 
+pack.useSparse::
+	When true, git will default to using the '--sparse' option in
+	'git pack-objects' when the '--revs' option is present. This
+	algorithm only walks trees that appear in paths that introduce new
+	objects. This can have significant performance benefits when
+	computing a pack to send a small change. However, it is possible
+	that extra objects are added to the pack-file if the included
+	commits contain certain types of direct renames.
+
 pack.writeBitmaps (deprecated)::
 	This is a deprecated synonym for `repack.writeBitmaps`.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7d5b0735e3..124b1bafc4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2711,6 +2711,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
 	}
+	if (!strcmp(k, "pack.usesparse")) {
+		sparse = git_config_bool(k, v);
+		return 0;
+	}
 	if (!strcmp(k, "pack.threads")) {
 		delta_search_threads = git_config_int(k, v);
 		if (delta_search_threads < 0)
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 45dba6e014..8f5699bd91 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -121,4 +121,19 @@ test_expect_success 'sparse pack-objects' '
 	test_cmp expect_sparse_objects.txt sparse_objects.txt
 '
 
+test_expect_success 'pack.useSparse enables algorithm' '
+	git config pack.useSparse true &&
+	git pack-objects --stdout --revs <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_sparse_objects.txt sparse_objects.txt
+'
+
+test_expect_success 'pack.useSparse overridden' '
+	git pack-objects --stdout --revs --no-sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 4/6] revision: implement sparse algorithm
From: Derrick Stolee via GitGitGadget @ 2018-12-14 21:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v4.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

When enumerating objects to place in a pack-file during 'git
pack-objects --revs', we discover the "frontier" of commits
that we care about and the boundary with commit we find
uninteresting. From that point, we walk trees to discover which
trees and blobs are uninteresting. Finally, we walk trees from the
interesting commits to find the interesting objects that are
placed in the pack.

This commit introduces a new, "sparse" way to discover the
uninteresting trees. We use the perspective of a single user trying
to push their topic to a large repository. That user likely changed
a very small fraction of the paths in their working directory, but
we spend a lot of time walking all reachable trees.

The way to switch the logic to work in this sparse way is to start
caring about which paths introduce new trees. While it is not
possible to generate a diff between the frontier boundary and all
of the interesting commits, we can simulate that behavior by
inspecting all of the root trees as a whole, then recursing down
to the set of trees at each path.

We already had taken the first step by passing an oidset to
mark_trees_uninteresting_sparse(). We now create a dictionary
whose keys are paths and values are oidsets. We consider the set
of trees that appear at each path. While we inspect a tree, we
add its subtrees to the oidsets corresponding to the tree entry's
path. We also mark trees as UNINTERESTING if the tree we are
parsing is UNINTERESTING.

To actually improve the peformance, we need to terminate our
recursion. If the oidset contains only UNINTERESTING trees, then
we do not continue the recursion. This avoids walking trees that
are likely to not be reachable from interesting trees. If the
oidset contains only interesting trees, then we will walk these
trees in the final stage that collects the intersting objects to
place in the pack. Thus, we only recurse if the oidset contains
both interesting and UNINITERESTING trees.

There are a few ways that this is not a universally better option.

First, we can pack extra objects. If someone copies a subtree
from one tree to another, the first tree will appear UNINTERESTING
and we will not recurse to see that the subtree should also be
UNINTERESTING. We will walk the new tree and see the subtree as
a "new" object and add it to the pack. We add a test case that
demonstrates this as a way to prove that the --sparse option is
actually working.

Second, we can have extra memory pressure. If instead of being a
single user pushing a small topic we are a server sending new
objects from across the entire working directory, then we will
gain very little (the recursion will rarely terminate early) but
will spend extra time maintaining the path-oidset dictionaries.

Despite these potential drawbacks, the benefits of the algorithm
are clear. By adding a counter to 'add_children_by_path' and
'mark_tree_contents_uninteresting', I measured the number of
parsed trees for the two algorithms in a variety of repos.

For git.git, I used the following input:

	v2.19.0
	^v2.19.0~10

 Objects to pack: 550
Walked (old alg): 282
Walked (new alg): 130

For the Linux repo, I used the following input:

	v4.18
	^v4.18~10

 Objects to pack:   518
Walked (old alg): 4,836
Walked (new alg):   188

The two repos above are rather "wide and flat" compared to
other repos that I have used in the past. As a comparison,
I tested an old topic branch in the Azure DevOps repo, which
has a much deeper folder structure than the Linux repo.

 Objects to pack:    220
Walked (old alg): 22,804
Walked (new alg):    129

I used the number of walked trees the main metric above because
it is consistent across multiple runs. When I ran my tests, the
performance of the pack-objects command with the same options
could change the end-to-end time by 10x depending on the file
system being warm. However, by repeating the same test on repeat
I could get more consistent timing results. The git.git and
Linux tests were too fast overall (less than 0.5s) to measure
an end-to-end difference. The Azure DevOps case was slow enough
to see the time improve from 15s to 1s in the warm case. The
cold case was 90s to 9s in my testing.

These improvements will have even larger benefits in the super-
large Windows repository. In our experiments, we see the
"Enumerate objects" phase of pack-objects taking 60-80% of the
end-to-end time of non-trivial pushes, taking longer than the
network time to send the pack and the server time to verify the
pack.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c                     | 139 ++++++++++++++++++++++++++++++---
 t/t5322-pack-objects-sparse.sh |  21 +++--
 2 files changed, 144 insertions(+), 16 deletions(-)

diff --git a/revision.c b/revision.c
index f9eb6400f1..63bf6230dc 100644
--- a/revision.c
+++ b/revision.c
@@ -27,6 +27,7 @@
 #include "commit-reach.h"
 #include "commit-graph.h"
 #include "prio-queue.h"
+#include "hashmap.h"

 volatile show_early_output_fn_t show_early_output;

@@ -99,29 +100,147 @@ void mark_tree_uninteresting(struct repository *r, struct tree *tree)
 	mark_tree_contents_uninteresting(r, tree);
 }

+struct path_and_oids_entry {
+	struct hashmap_entry ent;
+	char *path;
+	struct oidset set;
+};
+
+static int path_and_oids_cmp(const void *hashmap_cmp_fn_data,
+			     const struct path_and_oids_entry *e1,
+			     const struct path_and_oids_entry *e2,
+			     const void *keydata)
+{
+	return strcmp(e1->path, e2->path);
+}
+
+int map_flags = 0;
+static void paths_and_oids_init(struct hashmap *map)
+{
+	hashmap_init(map, (hashmap_cmp_fn) path_and_oids_cmp, &map_flags, 0);
+}
+
+static void paths_and_oids_clear(struct hashmap *map)
+{
+	struct hashmap_iter iter;
+	struct path_and_oids_entry *entry;
+	hashmap_iter_init(map, &iter);
+
+	while ((entry = (struct path_and_oids_entry *)hashmap_iter_next(&iter))) {
+		oidset_clear(&entry->set);
+		free(entry->path);
+	}
+
+	hashmap_free(map, 1);
+}
+
+static void paths_and_oids_insert(struct hashmap *map,
+				  const char *path,
+				  const struct object_id *oid)
+{
+	int hash = strhash(path);
+	struct path_and_oids_entry key;
+	struct path_and_oids_entry *entry;
+
+	hashmap_entry_init(&key, hash);
+	key.path = xstrdup(path);
+	oidset_init(&key.set, 0);
+
+	if (!(entry = (struct path_and_oids_entry *)hashmap_get(map, &key, NULL))) {
+		entry = xcalloc(1, sizeof(struct path_and_oids_entry));
+		hashmap_entry_init(entry, hash);
+		entry->path = key.path;
+		oidset_init(&entry->set, 16);
+		hashmap_put(map, entry);
+	} else {
+		free(key.path);
+	}
+
+	oidset_insert(&entry->set, oid);
+}
+
+static void add_children_by_path(struct repository *r,
+				 struct tree *tree,
+				 struct hashmap *map)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+
+	if (!tree)
+		return;
+
+	if (parse_tree_gently(tree, 1) < 0)
+		return;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+	while (tree_entry(&desc, &entry)) {
+		switch (object_type(entry.mode)) {
+		case OBJ_TREE:
+			paths_and_oids_insert(map, entry.path, entry.oid);
+
+			if (tree->object.flags & UNINTERESTING) {
+				struct tree *child = lookup_tree(r, entry.oid);
+				if (child)
+					child->object.flags |= UNINTERESTING;
+			}
+			break;
+		case OBJ_BLOB:
+			if (tree->object.flags & UNINTERESTING) {
+				struct blob *child = lookup_blob(r, entry.oid);
+				if (child)
+					child->object.flags |= UNINTERESTING;
+			}
+			break;
+		default:
+			/* Subproject commit - not in this repository */
+			break;
+		}
+	}
+
+	free_tree_buffer(tree);
+}
+
 void mark_trees_uninteresting_sparse(struct repository *r,
 				     struct oidset *set)
 {
+	unsigned has_interesting = 0, has_uninteresting = 0;
+	struct hashmap map;
+	struct hashmap_iter map_iter;
+	struct path_and_oids_entry *entry;
 	struct object_id *oid;
 	struct oidset_iter iter;

 	oidset_iter_init(set, &iter);
-	while ((oid = oidset_iter_next(&iter))) {
+	while ((!has_interesting || !has_uninteresting) &&
+	       (oid = oidset_iter_next(&iter))) {
 		struct tree *tree = lookup_tree(r, oid);

 		if (!tree)
 			continue;

-		if (tree->object.flags & UNINTERESTING) {
-			/*
-			 * Remove the flag so the next call
-			 * is not a no-op. The flag is added
-			 * in mark_tree_unintersting().
-			 */
-			tree->object.flags ^= UNINTERESTING;
-			mark_tree_uninteresting(r, tree);
-		}
+		if (tree->object.flags & UNINTERESTING)
+			has_uninteresting = 1;
+		else
+			has_interesting = 1;
+	}
+
+	/* Do not walk unless we have both types of trees. */
+	if (!has_uninteresting || !has_interesting)
+		return;
+
+	paths_and_oids_init(&map);
+
+	oidset_iter_init(set, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
+		struct tree *tree = lookup_tree(r, oid);
+		add_children_by_path(r, tree, &map);
 	}
+
+	hashmap_iter_init(&map, &map_iter);
+	while ((entry = hashmap_iter_next(&map_iter)))
+		mark_trees_uninteresting_sparse(r, &entry->set);
+
+	paths_and_oids_clear(&map);
 }

 struct commit_stack {
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 81f6805bc3..45dba6e014 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -83,22 +83,25 @@ test_expect_success 'sparse pack-objects' '
 	test_cmp expect_objects.txt sparse_objects.txt
 '

+# Demonstrate that the algorithms differ when we copy a tree wholesale
+# from one folder to another.
+
 test_expect_success 'duplicate a folder from f1 into f3' '
 	mkdir f3/f4 &&
 	cp -r f1/f1/* f3/f4 &&
 	git add f3/f4 &&
 	git commit -m "Copied f1/f1 to f3/f4" &&
-	cat >packinput.txt <<-EOF &&
+	cat >packinput.txt <<-EOF
 	topic1
 	^topic1~1
 	EOF
-	git rev-parse		\
-		topic1		\
-		topic1^{tree}	\
-		topic1:f3 | sort >expect_objects.txt
 '

 test_expect_success 'non-sparse pack-objects' '
+	git rev-parse			\
+		topic1			\
+		topic1^{tree}		\
+		topic1:f3 | sort >expect_objects.txt &&
 	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
 	git index-pack -o nonsparse.idx nonsparse.pack &&
 	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
@@ -106,10 +109,16 @@ test_expect_success 'non-sparse pack-objects' '
 '

 test_expect_success 'sparse pack-objects' '
+	git rev-parse			\
+		topic1			\
+		topic1^{tree}		\
+		topic1:f3		\
+		topic1:f3/f4		\
+		topic1:f3/f4/data.txt | sort >expect_sparse_objects.txt &&
 	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
 	git index-pack -o sparse.idx sparse.pack &&
 	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
-	test_cmp expect_objects.txt sparse_objects.txt
+	test_cmp expect_sparse_objects.txt sparse_objects.txt
 '

 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v4 3/6] pack-objects: add --sparse option
From: Derrick Stolee via GitGitGadget @ 2018-12-14 21:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v4.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Add a '--sparse' option flag to the pack-objects builtin. This
allows the user to specify that they want to use the new logic
for walking trees. This logic currently does not differ from the
existing output, but will in a later change.

Create a new test script, t5322-pack-objects-sparse.sh, to ensure
the object list that is selected matches what we expect. When we
update the logic to walk in a sparse fashion, the final test will
be updated to show the extra objects that are added.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-pack-objects.txt |  11 ++-
 builtin/pack-objects.c             |   5 +-
 t/t5322-pack-objects-sparse.sh     | 115 +++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+), 2 deletions(-)
 create mode 100755 t/t5322-pack-objects-sparse.sh

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 40c825c381..e45f3e680d 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	[--local] [--incremental] [--window=<n>] [--depth=<n>]
 	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
 	[--stdout [--filter=<filter-spec>] | base-name]
-	[--shallow] [--keep-true-parents] < object-list
+	[--shallow] [--keep-true-parents] [--sparse] < object-list
 
 
 DESCRIPTION
@@ -196,6 +196,15 @@ depth is 4095.
 	Add --no-reuse-object if you want to force a uniform compression
 	level on all data no matter the source.
 
+--sparse::
+	Use the "sparse" algorithm to determine which objects to include in
+	the pack, when combined with the "--revs" option. This algorithm
+	only walks trees that appear in paths that introduce new objects.
+	This can have significant performance benefits when computing
+	a pack to send a small change. However, it is possible that extra
+	objects are added to the pack-file if the included commits contain
+	certain types of direct renames.
+
 --thin::
 	Create a "thin" pack by omitting the common objects between a
 	sender and a receiver in order to reduce network transfer. This
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5f70d840a7..7d5b0735e3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -84,6 +84,7 @@ static unsigned long pack_size_limit;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
+static int sparse;
 static int thin;
 static int num_preferred_base;
 static struct progress *progress_state;
@@ -3135,7 +3136,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
-	mark_edges_uninteresting(&revs, show_edge, 0);
+	mark_edges_uninteresting(&revs, show_edge, sparse);
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
@@ -3292,6 +3293,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
 		  N_("unpack unreachable objects newer than <time>"),
 		  PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
+		OPT_BOOL(0, "sparse", &sparse,
+			 N_("use the sparse reachability algorithm")),
 		OPT_BOOL(0, "thin", &thin,
 			 N_("create thin packs")),
 		OPT_BOOL(0, "shallow", &shallow,
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
new file mode 100755
index 0000000000..81f6805bc3
--- /dev/null
+++ b/t/t5322-pack-objects-sparse.sh
@@ -0,0 +1,115 @@
+#!/bin/sh
+
+test_description='pack-objects object selection using sparse algorithm'
+. ./test-lib.sh
+
+test_expect_success 'setup repo' '
+	test_commit initial &&
+	for i in $(test_seq 1 3)
+	do
+		mkdir f$i &&
+		for j in $(test_seq 1 3)
+		do
+			mkdir f$i/f$j &&
+			echo $j >f$i/f$j/data.txt
+		done
+	done &&
+	git add . &&
+	git commit -m "Initialized trees" &&
+	for i in $(test_seq 1 3)
+	do
+		git checkout -b topic$i master &&
+		echo change-$i >f$i/f$i/data.txt &&
+		git commit -a -m "Changed f$i/f$i/data.txt"
+	done &&
+	cat >packinput.txt <<-EOF &&
+	topic1
+	^topic2
+	^topic3
+	EOF
+	git rev-parse			\
+		topic1			\
+		topic1^{tree}		\
+		topic1:f1		\
+		topic1:f1/f1		\
+		topic1:f1/f1/data.txt | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+# Demonstrate that both algorithms send "extra" objects because
+# they are not in the frontier.
+
+test_expect_success 'duplicate a folder from f3 and commit to topic1' '
+	git checkout topic1 &&
+	echo change-3 >f3/f3/data.txt &&
+	git commit -a -m "Changed f3/f3/data.txt" &&
+	git rev-parse			\
+		topic1~1		\
+		topic1~1^{tree}		\
+		topic1^{tree}		\
+		topic1			\
+		topic1:f1		\
+		topic1:f1/f1		\
+		topic1:f1/f1/data.txt	\
+		topic1:f3		\
+		topic1:f3/f3		\
+		topic1:f3/f3/data.txt | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+test_expect_success 'duplicate a folder from f1 into f3' '
+	mkdir f3/f4 &&
+	cp -r f1/f1/* f3/f4 &&
+	git add f3/f4 &&
+	git commit -m "Copied f1/f1 to f3/f4" &&
+	cat >packinput.txt <<-EOF &&
+	topic1
+	^topic1~1
+	EOF
+	git rev-parse		\
+		topic1		\
+		topic1^{tree}	\
+		topic1:f3 | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 2/6] list-objects: consume sparse tree walk
From: Derrick Stolee via GitGitGadget @ 2018-12-14 21:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v4.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

When creating a pack-file using 'git pack-objects --revs' we provide
a list of interesting and uninteresting commits. For example, a push
operation would make the local topic branch be interesting and the
known remote refs as uninteresting. We want to discover the set of
new objects to send to the server as a thin pack.

We walk these commits until we discover a frontier of commits such
that every commit walk starting at interesting commits ends in a root
commit or unintersting commit. We then need to discover which
non-commit objects are reachable from  uninteresting commits. This
commit walk is not changing during this series.

The mark_edges_uninteresting() method in list-objects.c iterates on
the commit list and does the following:

* If the commit is UNINTERSTING, then mark its root tree and every
  object it can reach as UNINTERESTING.

* If the commit is interesting, then mark the root tree of every
  UNINTERSTING parent (and all objects that tree can reach) as
  UNINTERSTING.

At the very end, we repeat the process on every commit directly
given to the revision walk from stdin. This helps ensure we properly
cover shallow commits that otherwise were not included in the
frontier.

The logic to recursively follow trees is in the
mark_tree_uninteresting() method in revision.c. The algorithm avoids
duplicate work by not recursing into trees that are already marked
UNINTERSTING.

Add a new 'sparse' option to the mark_edges_uninteresting() method
that performs this logic in a slightly new way. As we iterate over
the commits, we add all of the root trees to an oidset. Then, call
mark_trees_uninteresting_sparse() on that oidset. Note that we
include interesting trees in this process. The current implementation
of mark_trees_unintersting_sparse() will walk the same trees as
the old logic, but this will be replaced in a later change.

The sparse option is not used by any callers at the moment, but
will be wired to 'git pack-objects' in the next change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 bisect.c               |  2 +-
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  2 +-
 http-push.c            |  2 +-
 list-objects.c         | 70 +++++++++++++++++++++++++++++++++++-------
 list-objects.h         |  4 ++-
 6 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/bisect.c b/bisect.c
index 487675c672..842f8b4b8f 100644
--- a/bisect.c
+++ b/bisect.c
@@ -656,7 +656,7 @@ static void bisect_common(struct rev_info *revs)
 	if (prepare_revision_walk(revs))
 		die("revision walk setup failed");
 	if (revs->tree_objects)
-		mark_edges_uninteresting(revs, NULL);
+		mark_edges_uninteresting(revs, NULL, 0);
 }
 
 static void exit_if_skipped_commits(struct commit_list *tried,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 411aefd687..5f70d840a7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3135,7 +3135,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
-	mark_edges_uninteresting(&revs, show_edge);
+	mark_edges_uninteresting(&revs, show_edge, 0);
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 2880ed37e3..9663cbfae0 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -543,7 +543,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	if (revs.tree_objects)
-		mark_edges_uninteresting(&revs, show_edge);
+		mark_edges_uninteresting(&revs, show_edge, 0);
 
 	if (bisect_list) {
 		int reaches, all;
diff --git a/http-push.c b/http-push.c
index cd48590912..ea52d6f9f6 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1933,7 +1933,7 @@ int cmd_main(int argc, const char **argv)
 		pushing = 0;
 		if (prepare_revision_walk(&revs))
 			die("revision walk setup failed");
-		mark_edges_uninteresting(&revs, NULL);
+		mark_edges_uninteresting(&revs, NULL, 0);
 		objects_to_send = get_delta(&revs, ref_lock);
 		finish_all_active_slots();
 
diff --git a/list-objects.c b/list-objects.c
index c41cc80db5..fb728f7842 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -222,25 +222,73 @@ static void mark_edge_parents_uninteresting(struct commit *commit,
 	}
 }
 
-void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
+static void add_edge_parents(struct commit *commit,
+			     struct rev_info *revs,
+			     show_edge_fn show_edge,
+			     struct oidset *set)
+{
+	struct commit_list *parents;
+
+	for (parents = commit->parents; parents; parents = parents->next) {
+		struct commit *parent = parents->item;
+		struct tree *tree = get_commit_tree(parent);
+
+		if (!tree)
+			continue;
+
+		oidset_insert(set, &tree->object.oid);
+
+		if (!(parent->object.flags & UNINTERESTING))
+			continue;
+		tree->object.flags |= UNINTERESTING;
+
+		if (revs->edge_hint && !(parent->object.flags & SHOWN)) {
+			parent->object.flags |= SHOWN;
+			show_edge(parent);
+		}
+	}
+}
+
+void mark_edges_uninteresting(struct rev_info *revs,
+			      show_edge_fn show_edge,
+			      int sparse)
 {
 	struct commit_list *list;
 	int i;
 
-	for (list = revs->commits; list; list = list->next) {
-		struct commit *commit = list->item;
+	if (sparse) {
+		struct oidset set;
+		oidset_init(&set, 16);
 
-		if (commit->object.flags & UNINTERESTING) {
-			mark_tree_uninteresting(revs->repo,
-						get_commit_tree(commit));
-			if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
-				commit->object.flags |= SHOWN;
-				show_edge(commit);
+		for (list = revs->commits; list; list = list->next) {
+			struct commit *commit = list->item;
+			struct tree *tree = get_commit_tree(commit);
+
+			if (commit->object.flags & UNINTERESTING)
+				tree->object.flags |= UNINTERESTING;
+
+			oidset_insert(&set, &tree->object.oid);
+			add_edge_parents(commit, revs, show_edge, &set);
+		}
+
+		mark_trees_uninteresting_sparse(revs->repo, &set);
+		oidset_clear(&set);
+	} else {
+		for (list = revs->commits; list; list = list->next) {
+			struct commit *commit = list->item;
+			if (commit->object.flags & UNINTERESTING) {
+				mark_tree_uninteresting(revs->repo,
+							get_commit_tree(commit));
+				if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
+					commit->object.flags |= SHOWN;
+					show_edge(commit);
+				}
+				continue;
 			}
-			continue;
+			mark_edge_parents_uninteresting(commit, revs, show_edge);
 		}
-		mark_edge_parents_uninteresting(commit, revs, show_edge);
 	}
+
 	if (revs->edge_hint_aggressive) {
 		for (i = 0; i < revs->cmdline.nr; i++) {
 			struct object *obj = revs->cmdline.rev[i].item;
diff --git a/list-objects.h b/list-objects.h
index ad40762926..a952680e46 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -10,7 +10,9 @@ typedef void (*show_object_fn)(struct object *, const char *, void *);
 void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
-void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
+void mark_edges_uninteresting(struct rev_info *revs,
+			      show_edge_fn show_edge,
+			      int sparse);
 
 struct oidset;
 struct list_objects_filter_options;
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 1/6] revision: add mark_tree_uninteresting_sparse
From: Derrick Stolee via GitGitGadget @ 2018-12-14 21:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano, Derrick Stolee
In-Reply-To: <pull.89.v4.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

In preparation for a new algorithm that walks fewer trees when
creating a pack from a set of revisions, create a method that
takes an oidset of tree oids and marks reachable objects as
UNINTERESTING.

The current implementation uses the existing
mark_tree_uninteresting to recursively walk the trees and blobs.
This will walk the same number of trees as the old mechanism.

There is one new assumption in this approach: we are also given
the oids of the interesting trees. This implementation does not
use those trees at the moment, but we will use them in a later
rewrite of this method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c | 25 +++++++++++++++++++++++++
 revision.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/revision.c b/revision.c
index 13e0519c02..f9eb6400f1 100644
--- a/revision.c
+++ b/revision.c
@@ -99,6 +99,31 @@ void mark_tree_uninteresting(struct repository *r, struct tree *tree)
 	mark_tree_contents_uninteresting(r, tree);
 }
 
+void mark_trees_uninteresting_sparse(struct repository *r,
+				     struct oidset *set)
+{
+	struct object_id *oid;
+	struct oidset_iter iter;
+
+	oidset_iter_init(set, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
+		struct tree *tree = lookup_tree(r, oid);
+
+		if (!tree)
+			continue;
+
+		if (tree->object.flags & UNINTERESTING) {
+			/*
+			 * Remove the flag so the next call
+			 * is not a no-op. The flag is added
+			 * in mark_tree_unintersting().
+			 */
+			tree->object.flags ^= UNINTERESTING;
+			mark_tree_uninteresting(r, tree);
+		}
+	}
+}
+
 struct commit_stack {
 	struct commit **items;
 	size_t nr, alloc;
diff --git a/revision.h b/revision.h
index 7987bfcd2e..f828e91ae9 100644
--- a/revision.h
+++ b/revision.h
@@ -67,6 +67,7 @@ struct rev_cmdline_info {
 #define REVISION_WALK_NO_WALK_SORTED 1
 #define REVISION_WALK_NO_WALK_UNSORTED 2
 
+struct oidset;
 struct topo_walk_info;
 
 struct rev_info {
@@ -327,6 +328,7 @@ void put_revision_mark(const struct rev_info *revs,
 
 void mark_parents_uninteresting(struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
+void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *set);
 
 void show_object_with_name(FILE *, struct object *, const char *);
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 0/6] Add a new "sparse" tree walk algorithm
From: Derrick Stolee via GitGitGadget @ 2018-12-14 21:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, jrnieder, Junio C Hamano
In-Reply-To: <pull.89.v3.git.gitgitgadget@gmail.com>

One of the biggest remaining pain points for users of very large
repositories is the time it takes to run 'git push'. We inspected some slow
pushes by our developers and found that the "Enumerating Objects" phase of a
push was very slow. This is unsurprising, because this is why reachability
bitmaps exist. However, reachability bitmaps are not available to us because
of the single pack-file requirement. The bitmap approach is intended for
servers anyway, and clients have a much different behavior pattern.

Specifically, clients are normally pushing a very small number of objects
compared to the entire working directory. A typical user changes only a
small cone of the working directory, so let's use that to our benefit.

Create a new "sparse" mode for 'git pack-objects' that uses the paths that
introduce new objects to direct our search into the reachable trees. By
collecting trees at each path, we can then recurse into a path only when
there are uninteresting and interesting trees at that path. This gains a
significant performance boost for small topics while presenting a
possibility of packing extra objects.

The main algorithm change is in patch 4, but is set up a little bit in
patches 1 and 2.

As demonstrated in the included test script, we see that the existing
algorithm can send extra objects due to the way we specify the "frontier".
But we can send even more objects if a user copies objects from one folder
to another. I say "copy" because a rename would (usually) change the
original folder and trigger a walk into that path, discovering the objects.

In order to benefit from this approach, the user can opt-in using the
pack.useSparse config setting. This setting can be overridden using the
'--no-sparse' option.

Update in V2: 

 * Added GIT_TEST_PACK_SPARSE test option.
 * Fixed test breakages when GIT_TEST_PACK_SPARSE is enabled by adding null
   checks.

Update in V3:

 * Change documentation around 'pack.useSparse' config setting to better
   advertise to users.
 * Mostly a ping now that v2.20.0 is out.

Updates in V4:

 * Switched to using hashmap instead of string_list for the path/oidset
   dictionary. (This is due to some fear that the string_list performance
   would degrade for a very wide tree, but I am unable to measure a
   performance difference.)
 * Some cleanup of code snippets across commits.
 * Some grammar cleanup in the commit messages.

Derrick Stolee (6):
  revision: add mark_tree_uninteresting_sparse
  list-objects: consume sparse tree walk
  pack-objects: add --sparse option
  revision: implement sparse algorithm
  pack-objects: create pack.useSparse setting
  pack-objects: create GIT_TEST_PACK_SPARSE

 Documentation/config/pack.txt      |   9 ++
 Documentation/git-pack-objects.txt |  11 ++-
 bisect.c                           |   2 +-
 builtin/pack-objects.c             |  10 +-
 builtin/rev-list.c                 |   2 +-
 http-push.c                        |   2 +-
 list-objects.c                     |  70 +++++++++++---
 list-objects.h                     |   4 +-
 revision.c                         | 144 +++++++++++++++++++++++++++++
 revision.h                         |   2 +
 t/README                           |   4 +
 t/t5322-pack-objects-sparse.sh     | 139 ++++++++++++++++++++++++++++
 12 files changed, 382 insertions(+), 17 deletions(-)
 create mode 100755 t/t5322-pack-objects-sparse.sh


base-commit: a1598010f775d82b5adf12c29d0f5bc9b41434c6
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-89%2Fderrickstolee%2Fpush%2Fsparse-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-89/derrickstolee/push/sparse-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/89

Range-diff vs v3:

 1:  60617681f7 ! 1:  817e30a287 revision: add mark_tree_uninteresting_sparse
     @@ -35,6 +35,9 @@
      +	while ((oid = oidset_iter_next(&iter))) {
      +		struct tree *tree = lookup_tree(r, oid);
      +
     ++		if (!tree)
     ++			continue;
     ++
      +		if (tree->object.flags & UNINTERESTING) {
      +			/*
      +			 * Remove the flag so the next call
 2:  4527addacb ! 2:  39dc89beb9 list-objects: consume sparse tree walk
     @@ -11,9 +11,10 @@
          We walk these commits until we discover a frontier of commits such
          that every commit walk starting at interesting commits ends in a root
          commit or unintersting commit. We then need to discover which
     -    non-commit objects are reachable from  uninteresting commits.
     +    non-commit objects are reachable from  uninteresting commits. This
     +    commit walk is not changing during this series.
      
     -    The mark_edges_unintersting() method in list-objects.c iterates on
     +    The mark_edges_uninteresting() method in list-objects.c iterates on
          the commit list and does the following:
      
          * If the commit is UNINTERSTING, then mark its root tree and every
     @@ -138,17 +139,22 @@
      +			      int sparse)
       {
       	struct commit_list *list;
     -+	struct oidset set;
       	int i;
       
     -+	if (sparse)
     +-	for (list = revs->commits; list; list = list->next) {
     +-		struct commit *commit = list->item;
     ++	if (sparse) {
     ++		struct oidset set;
      +		oidset_init(&set, 16);
     -+
     - 	for (list = revs->commits; list; list = list->next) {
     - 		struct commit *commit = list->item;
       
      -		if (commit->object.flags & UNINTERESTING) {
     -+		if (sparse) {
     +-			mark_tree_uninteresting(revs->repo,
     +-						get_commit_tree(commit));
     +-			if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
     +-				commit->object.flags |= SHOWN;
     +-				show_edge(commit);
     ++		for (list = revs->commits; list; list = list->next) {
     ++			struct commit *commit = list->item;
      +			struct tree *tree = get_commit_tree(commit);
      +
      +			if (commit->object.flags & UNINTERESTING)
     @@ -156,24 +162,27 @@
      +
      +			oidset_insert(&set, &tree->object.oid);
      +			add_edge_parents(commit, revs, show_edge, &set);
     -+		} else if (commit->object.flags & UNINTERESTING) {
     - 			mark_tree_uninteresting(revs->repo,
     - 						get_commit_tree(commit));
     - 			if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
     - 				commit->object.flags |= SHOWN;
     - 				show_edge(commit);
     ++		}
     ++
     ++		mark_trees_uninteresting_sparse(revs->repo, &set);
     ++		oidset_clear(&set);
     ++	} else {
     ++		for (list = revs->commits; list; list = list->next) {
     ++			struct commit *commit = list->item;
     ++			if (commit->object.flags & UNINTERESTING) {
     ++				mark_tree_uninteresting(revs->repo,
     ++							get_commit_tree(commit));
     ++				if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
     ++					commit->object.flags |= SHOWN;
     ++					show_edge(commit);
     ++				}
     ++				continue;
       			}
      -			continue;
     -+		} else {
      +			mark_edge_parents_uninteresting(commit, revs, show_edge);
       		}
      -		mark_edge_parents_uninteresting(commit, revs, show_edge);
       	}
     -+
     -+	if (sparse) {
     -+		mark_trees_uninteresting_sparse(revs->repo, &set);
     -+		oidset_clear(&set);
     -+	}
      +
       	if (revs->edge_hint_aggressive) {
       		for (i = 0; i < revs->cmdline.nr; i++) {
     @@ -193,17 +202,3 @@
       
       struct oidset;
       struct list_objects_filter_options;
     -
     -diff --git a/revision.c b/revision.c
     ---- a/revision.c
     -+++ b/revision.c
     -@@
     - 	while ((oid = oidset_iter_next(&iter))) {
     - 		struct tree *tree = lookup_tree(r, oid);
     - 
     -+		if (!tree)
     -+			continue;
     -+
     - 		if (tree->object.flags & UNINTERESTING) {
     - 			/*
     - 			 * Remove the flag so the next call
 3:  4ef318bdb2 = 3:  ab733daff5 pack-objects: add --sparse option
 4:  571b2e2784 ! 4:  c44172c35e revision: implement sparse algorithm
     @@ -6,8 +6,9 @@
          pack-objects --revs', we discover the "frontier" of commits
          that we care about and the boundary with commit we find
          uninteresting. From that point, we walk trees to discover which
     -    trees and blobs are uninteresting. Finally, we walk trees to find
     -    the interesting trees.
     +    trees and blobs are uninteresting. Finally, we walk trees from the
     +    interesting commits to find the interesting objects that are
     +    placed in the pack.
      
          This commit introduces a new, "sparse" way to discover the
          uninteresting trees. We use the perspective of a single user trying
     @@ -31,11 +32,13 @@
          parsing is UNINTERESTING.
      
          To actually improve the peformance, we need to terminate our
     -    recursion unless the oidset contains some intersting trees and
     -    some uninteresting trees. Technically, we only need one interesting
     -    tree for this to speed up in most cases, but we also will not mark
     -    anything UNINTERESTING if there are no uninteresting trees, so
     -    that would be wasted effort.
     +    recursion. If the oidset contains only UNINTERESTING trees, then
     +    we do not continue the recursion. This avoids walking trees that
     +    are likely to not be reachable from interesting trees. If the
     +    oidset contains only interesting trees, then we will walk these
     +    trees in the final stage that collects the intersting objects to
     +    place in the pack. Thus, we only recurse if the oidset contains
     +    both interesting and UNINITERESTING trees.
      
          There are a few ways that this is not a universally better option.
      
     @@ -108,51 +111,80 @@
      diff --git a/revision.c b/revision.c
      --- a/revision.c
      +++ b/revision.c
     +@@
     + #include "commit-reach.h"
     + #include "commit-graph.h"
     + #include "prio-queue.h"
     ++#include "hashmap.h"
     + 
     + volatile show_early_output_fn_t show_early_output;
     + 
      @@
       	mark_tree_contents_uninteresting(r, tree);
       }
       
     -+struct paths_and_oids {
     -+	struct string_list list;
     ++struct path_and_oids_entry {
     ++	struct hashmap_entry ent;
     ++	char *path;
     ++	struct oidset set;
      +};
      +
     -+static void paths_and_oids_init(struct paths_and_oids *po)
     ++static int path_and_oids_cmp(const void *hashmap_cmp_fn_data,
     ++			     const struct path_and_oids_entry *e1,
     ++			     const struct path_and_oids_entry *e2,
     ++			     const void *keydata)
     ++{
     ++	return strcmp(e1->path, e2->path);
     ++}
     ++
     ++int map_flags = 0;
     ++static void paths_and_oids_init(struct hashmap *map)
      +{
     -+	string_list_init(&po->list, 1);
     ++	hashmap_init(map, (hashmap_cmp_fn) path_and_oids_cmp, &map_flags, 0);
      +}
      +
     -+static void paths_and_oids_clear(struct paths_and_oids *po)
     ++static void paths_and_oids_clear(struct hashmap *map)
      +{
     -+	int i;
     -+	for (i = 0; i < po->list.nr; i++) {
     -+		oidset_clear(po->list.items[i].util);
     -+		free(po->list.items[i].util);
     ++	struct hashmap_iter iter;
     ++	struct path_and_oids_entry *entry;
     ++	hashmap_iter_init(map, &iter);
     ++
     ++	while ((entry = (struct path_and_oids_entry *)hashmap_iter_next(&iter))) {
     ++		oidset_clear(&entry->set);
     ++		free(entry->path);
      +	}
      +
     -+	string_list_clear(&po->list, 0);
     ++	hashmap_free(map, 1);
      +}
      +
     -+static void paths_and_oids_insert(struct paths_and_oids *po,
     ++static void paths_and_oids_insert(struct hashmap *map,
      +				  const char *path,
      +				  const struct object_id *oid)
      +{
     -+	struct string_list_item *item = string_list_insert(&po->list, path);
     -+	struct oidset *set;
     ++	int hash = strhash(path);
     ++	struct path_and_oids_entry key;
     ++	struct path_and_oids_entry *entry;
     ++
     ++	hashmap_entry_init(&key, hash);
     ++	key.path = xstrdup(path);
     ++	oidset_init(&key.set, 0);
      +
     -+	if (!item->util) {
     -+		set = xcalloc(1, sizeof(struct oidset));
     -+		oidset_init(set, 16);
     -+		item->util = set;
     ++	if (!(entry = (struct path_and_oids_entry *)hashmap_get(map, &key, NULL))) {
     ++		entry = xcalloc(1, sizeof(struct path_and_oids_entry));
     ++		hashmap_entry_init(entry, hash);
     ++		entry->path = key.path;
     ++		oidset_init(&entry->set, 16);
     ++		hashmap_put(map, entry);
      +	} else {
     -+		set = item->util;
     ++		free(key.path);
      +	}
      +
     -+	oidset_insert(set, oid);
     ++	oidset_insert(&entry->set, oid);
      +}
      +
      +static void add_children_by_path(struct repository *r,
      +				 struct tree *tree,
     -+				 struct paths_and_oids *po)
     ++				 struct hashmap *map)
      +{
      +	struct tree_desc desc;
      +	struct name_entry entry;
     @@ -167,7 +199,7 @@
      +	while (tree_entry(&desc, &entry)) {
      +		switch (object_type(entry.mode)) {
      +		case OBJ_TREE:
     -+			paths_and_oids_insert(po, entry.path, entry.oid);
     ++			paths_and_oids_insert(map, entry.path, entry.oid);
      +
      +			if (tree->object.flags & UNINTERESTING) {
      +				struct tree *child = lookup_tree(r, entry.oid);
     @@ -194,9 +226,10 @@
       void mark_trees_uninteresting_sparse(struct repository *r,
       				     struct oidset *set)
       {
     -+	int i;
      +	unsigned has_interesting = 0, has_uninteresting = 0;
     -+	struct paths_and_oids po;
     ++	struct hashmap map;
     ++	struct hashmap_iter map_iter;
     ++	struct path_and_oids_entry *entry;
       	struct object_id *oid;
       	struct oidset_iter iter;
       
     @@ -222,25 +255,25 @@
      +			has_uninteresting = 1;
      +		else
      +			has_interesting = 1;
     - 	}
     ++	}
      +
      +	/* Do not walk unless we have both types of trees. */
      +	if (!has_uninteresting || !has_interesting)
      +		return;
      +
     -+	paths_and_oids_init(&po);
     ++	paths_and_oids_init(&map);
      +
      +	oidset_iter_init(set, &iter);
      +	while ((oid = oidset_iter_next(&iter))) {
      +		struct tree *tree = lookup_tree(r, oid);
     -+		add_children_by_path(r, tree, &po);
     -+	}
     ++		add_children_by_path(r, tree, &map);
     + 	}
      +
     -+	for (i = 0; i < po.list.nr; i++)
     -+		mark_trees_uninteresting_sparse(
     -+			r, (struct oidset *)po.list.items[i].util);
     ++	hashmap_iter_init(&map, &map_iter);
     ++	while ((entry = hashmap_iter_next(&map_iter)))
     ++		mark_trees_uninteresting_sparse(r, &entry->set);
      +
     -+	paths_and_oids_clear(&po);
     ++	paths_and_oids_clear(&map);
       }
       
       struct commit_stack {
 5:  33d2c04dd6 = 5:  f386f6c3c9 pack-objects: create pack.useSparse setting
 6:  e4f29543ee = 6:  d011a9c1b1 pack-objects: create GIT_TEST_PACK_SPARSE

-- 
gitgitgadget

^ permalink raw reply

* Bug in lineendings handling that prevents resetting checking out, rebasing etc
From: Mr&Mrs D @ 2018-12-14 21:04 UTC (permalink / raw)
  To: git

Hi all,

I maintain a python project you can clone from:

git@github.com:wrye-bash/wrye-bash.git

For reasons unknown git sees a particular file as changed
(Mopy/Docs/Bash Readme Template.html, sometimes others too). This file
was probably committed to the svn repository this git repo was created
from with CRLF line endings. When we moved to git we added a
gitattributes file (
https://github.com/wrye-bash/wrye-bash/blob/dev/.gitattributes ) and
this file was edited to explicitly state htms are text - all to no
avail. From time to time - on windows - as in when checking out an old
commit - git would see that file as changed. The only workaround that
worked for me was

    git rm -r . --cached -q && git reset --hard

For more details and discussion see this SO question I posted almost
five years ago:

https://stackoverflow.com/questions/21122094/git-line-endings-cant-stash-reset-and-now-cant-rebase-over-spurious-line-en

I used to work in windows and the bug was tolerable as there was that
workaround. Now I moved to mac and no workaround works anymore - we
have a special page on our wiki  with workarounds for this one btw:

https://github.com/wrye-bash/wrye-bash/wiki/%5Bgit%5D-Issues-with-line-endings-preventing-checking,-merge,-etc

Well after 5 years and countless hours trying to solve this I reach
out to you guys and girls - _this is a full-time bug in git line
endings handling_. When someone issues a git reset --hard this should
work no matter what - well it does not. So this bug may be really a
can of worms.

Please someone clone this repo on linux or mac - probably just cloning
will have the files appear as changed (by the way hitting refresh on
git gui I have different sets of files appear as changed). If not then

git checkout utumno-wip
git rebase -i dev

and then select a commit to edit should be enough to trigger this bug

Needless to say I am  well aware of things like `git add --renormalize
.` - but renormalizing is not the issue. The issue is that _files show
as changed and even a git reset --hard won't convince git that
nothing's changed_.

$ git reset --hard
HEAD is now at e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
$ git status
interactive rebase in progress; onto 02ae6f26
Last commands done (4 commands done):
   pick 3a39a0c0 Monkey patch for undecodable inis:
   pick e5c16790 Wip proper handling of ini tweaks encoding - TODOs:
  (see more in file .git/rebase-merge/done)
Next commands to do (19 remaining commands):
   edit a3a7b237 Amend last commit and linefixes:  ΕΕΕΕ
   edit 432fd314 fFF handle empty or malformed inis
  (use "git rebase --edit-todo" to view and edit)
You are currently editing a commit while rebasing branch 'utumno-wip'
on '02ae6f26'.
  (use "git commit --amend" to amend the current commit)
  (use "git rebase --continue" once you are satisfied with your changes)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

modified:   Mopy/Docs/Bash Readme Template.html

Untracked files:
  (use "git add <file>..." to include in what will be committed)

.DS_Store
.idea.7z

no changes added to commit (use "git add" and/or "git commit -a")
$

I really hope someone here can debug this
Thanks!

^ permalink raw reply

* Re: [PATCH v5 1/1] protocol: advertise multiple supported versions
From: Ævar Arnfjörð Bjarmason @ 2018-12-14 20:20 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: git, gitster, sbeller, jonathantanmy, szeder.dev,
	Johannes Schindelin, Jonathan Nieder, Jeff King
In-Reply-To: <60f6f2fbd8ee03b2d461803b9313b7473300eecc.1542407348.git.steadmon@google.com>


On Fri, Nov 16 2018, Josh Steadmon wrote:

I started looking at this to address
https://public-inbox.org/git/nycvar.QRO.7.76.6.1812141318520.43@tvgsbejvaqbjf.bet/
and was about to send a re-roll of my own series, but then...

> Currently the client advertises that it supports the wire protocol
> version set in the protocol.version config. However, not all services
> support the same set of protocol versions. For example, git-receive-pack
> supports v1 and v0, but not v2. If a client connects to git-receive-pack
> and requests v2, it will instead be downgraded to v0. Other services,
> such as git-upload-archive, do not do any version negotiation checks.
>
> This patch creates a protocol version registry. Individual client and
> server programs register all the protocol versions they support prior to
> communicating with a remote instance. Versions should be listed in
> preference order; the version specified in protocol.version will
> automatically be moved to the front of the registry.
>
> The protocol version registry is passed to remote helpers via the
> GIT_PROTOCOL environment variable.
>
> Clients now advertise the full list of registered versions. Servers
> select the first allowed version from this advertisement.
>
> Additionally, remove special cases around advertising version=0.
> Previously we avoided adding version negotiation fields in server
> responses if it looked like the client wanted v0. However, including
> these fields does not change behavior, so it's better to have simpler
> code.

...this paragraph is new in your v5, from the cover letter: "Changes
from V4: remove special cases around advertising version=0". However as
seen in the code & tests:

> [...]
>  static void push_ssh_options(struct argv_array *args, struct argv_array *env,
>  			     enum ssh_variant variant, const char *port,
> -			     enum protocol_version version, int flags)
> +			     const struct strbuf *version_advert, int flags)
>  {
> -	if (variant == VARIANT_SSH &&
> -	    version > 0) {
> +	if (variant == VARIANT_SSH) {
>  		argv_array_push(args, "-o");
>  		argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT);
> -		argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d",
> -				 version);
> +		argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=%s",
> +				 version_advert->buf);
>  	}
> [...]
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -346,7 +346,7 @@ expect_ssh () {
>
>  test_expect_success 'clone myhost:src uses ssh' '
>  	git clone myhost:src ssh-clone &&
> -	expect_ssh myhost src
> +	expect_ssh "-o SendEnv=GIT_PROTOCOL" myhost src
>  '
>
>  test_expect_success !MINGW,!CYGWIN 'clone local path foo:bar' '
> @@ -357,12 +357,12 @@ test_expect_success !MINGW,!CYGWIN 'clone local path foo:bar' '
>
>  test_expect_success 'bracketed hostnames are still ssh' '
>  	git clone "[myhost:123]:src" ssh-bracket-clone &&
> -	expect_ssh "-p 123" myhost src
> +	expect_ssh "-o SendEnv=GIT_PROTOCOL -p 123" myhost src
>  '
>
>  test_expect_success 'OpenSSH variant passes -4' '
>  	git clone -4 "[myhost:123]:src" ssh-ipv4-clone &&
> -	expect_ssh "-4 -p 123" myhost src
> +	expect_ssh "-o SendEnv=GIT_PROTOCOL -4 -p 123" myhost src
>  '
>
>  test_expect_success 'variant can be overridden' '
> @@ -406,7 +406,7 @@ test_expect_success 'OpenSSH-like uplink is treated as ssh' '
>  	GIT_SSH="$TRASH_DIRECTORY/uplink" &&
>  	test_when_finished "GIT_SSH=\"\$TRASH_DIRECTORY/ssh\$X\"" &&
>  	git clone "[myhost:123]:src" ssh-bracket-clone-sshlike-uplink &&
> -	expect_ssh "-p 123" myhost src
> +	expect_ssh "-o SendEnv=GIT_PROTOCOL -p 123" myhost src
>  '
>
>  test_expect_success 'plink is treated specially (as putty)' '
> @@ -446,14 +446,14 @@ test_expect_success 'GIT_SSH_VARIANT overrides plink detection' '
>  	copy_ssh_wrapper_as "$TRASH_DIRECTORY/plink" &&
>  	GIT_SSH_VARIANT=ssh \
>  	git clone "[myhost:123]:src" ssh-bracket-clone-variant-1 &&
> -	expect_ssh "-p 123" myhost src
> +	expect_ssh "-o SendEnv=GIT_PROTOCOL -p 123" myhost src
>  '
>
>  test_expect_success 'ssh.variant overrides plink detection' '
>  	copy_ssh_wrapper_as "$TRASH_DIRECTORY/plink" &&
>  	git -c ssh.variant=ssh \
>  		clone "[myhost:123]:src" ssh-bracket-clone-variant-2 &&
> -	expect_ssh "-p 123" myhost src
> +	expect_ssh "-o SendEnv=GIT_PROTOCOL -p 123" myhost src
>  '
>
>  test_expect_success 'GIT_SSH_VARIANT overrides plink detection to plink' '
> @@ -488,7 +488,7 @@ test_clone_url () {
>  }
>
>  test_expect_success !MINGW 'clone c:temp is ssl' '
> -	test_clone_url c:temp c temp
> +	test_clone_url c:temp "-o SendEnv=GIT_PROTOCOL" c temp
>  '
>
>  test_expect_success MINGW 'clone c:temp is dos drive' '
> @@ -499,7 +499,7 @@ test_expect_success MINGW 'clone c:temp is dos drive' '
>  for repo in rep rep/home/project 123
>  do
>  	test_expect_success "clone host:$repo" '
> -		test_clone_url host:$repo host $repo
> +		test_clone_url host:$repo "-o SendEnv=GIT_PROTOCOL" host $repo
>  	'
>  done
>
> @@ -507,16 +507,16 @@ done
>  for repo in rep rep/home/project 123
>  do
>  	test_expect_success "clone [::1]:$repo" '
> -		test_clone_url [::1]:$repo ::1 "$repo"
> +		test_clone_url [::1]:$repo "-o SendEnv=GIT_PROTOCOL" ::1 "$repo"
>  	'
>  done
>  #home directory
>  test_expect_success "clone host:/~repo" '
> -	test_clone_url host:/~repo host "~repo"
> +	test_clone_url host:/~repo "-o SendEnv=GIT_PROTOCOL" host "~repo"
>  '
>
>  test_expect_success "clone [::1]:/~repo" '
> -	test_clone_url [::1]:/~repo ::1 "~repo"
> +	test_clone_url [::1]:/~repo "-o SendEnv=GIT_PROTOCOL" ::1 "~repo"
>  '
>
>  # Corner cases
> @@ -532,22 +532,22 @@ done
>  for tcol in "" :
>  do
>  	test_expect_success "clone ssh://host.xz$tcol/home/user/repo" '
> -		test_clone_url "ssh://host.xz$tcol/home/user/repo" host.xz /home/user/repo
> +		test_clone_url "ssh://host.xz$tcol/home/user/repo" "-o SendEnv=GIT_PROTOCOL" host.xz /home/user/repo
>  	'
>  	# from home directory
>  	test_expect_success "clone ssh://host.xz$tcol/~repo" '
> -	test_clone_url "ssh://host.xz$tcol/~repo" host.xz "~repo"
> +	test_clone_url "ssh://host.xz$tcol/~repo" "-o SendEnv=GIT_PROTOCOL" host.xz "~repo"
>  '
>  done
>
>  # with port number
>  test_expect_success 'clone ssh://host.xz:22/home/user/repo' '
> -	test_clone_url "ssh://host.xz:22/home/user/repo" "-p 22 host.xz" "/home/user/repo"
> +	test_clone_url "ssh://host.xz:22/home/user/repo" "-o SendEnv=GIT_PROTOCOL -p 22 host.xz" "/home/user/repo"
>  '
>
>  # from home directory with port number
>  test_expect_success 'clone ssh://host.xz:22/~repo' '
> -	test_clone_url "ssh://host.xz:22/~repo" "-p 22 host.xz" "~repo"
> +	test_clone_url "ssh://host.xz:22/~repo" "-o SendEnv=GIT_PROTOCOL -p 22 host.xz" "~repo"
>  '
>
>  #IPv6
> @@ -555,7 +555,7 @@ for tuah in ::1 [::1] [::1]: user@::1 user@[::1] user@[::1]: [user@::1] [user@::
>  do
>  	ehost=$(echo $tuah | sed -e "s/1]:/1]/" | tr -d "[]")
>  	test_expect_success "clone ssh://$tuah/home/user/repo" "
> -	  test_clone_url ssh://$tuah/home/user/repo $ehost /home/user/repo
> +	  test_clone_url ssh://$tuah/home/user/repo '-o SendEnv=GIT_PROTOCOL' $ehost /home/user/repo
>  	"
>  done
>
> @@ -564,7 +564,7 @@ for tuah in ::1 [::1] user@::1 user@[::1] [user@::1]
>  do
>  	euah=$(echo $tuah | tr -d "[]")
>  	test_expect_success "clone ssh://$tuah/~repo" "
> -	  test_clone_url ssh://$tuah/~repo $euah '~repo'
> +	  test_clone_url ssh://$tuah/~repo '-o SendEnv=GIT_PROTOCOL' $euah '~repo'
>  	"
>  done
>
> @@ -573,7 +573,7 @@ for tuah in [::1] user@[::1] [user@::1]
>  do
>  	euah=$(echo $tuah | tr -d "[]")
>  	test_expect_success "clone ssh://$tuah:22/home/user/repo" "
> -	  test_clone_url ssh://$tuah:22/home/user/repo '-p 22' $euah /home/user/repo
> +	  test_clone_url ssh://$tuah:22/home/user/repo '-o SendEnv=GIT_PROTOCOL -p 22' $euah /home/user/repo
>  	"
>  done
>
> @@ -582,7 +582,7 @@ for tuah in [::1] user@[::1] [user@::1]
>  do
>  	euah=$(echo $tuah | tr -d "[]")
>  	test_expect_success "clone ssh://$tuah:22/~repo" "
> -	  test_clone_url ssh://$tuah:22/~repo '-p 22' $euah '~repo'
> +	  test_clone_url ssh://$tuah:22/~repo '-o SendEnv=GIT_PROTOCOL -p 22' $euah '~repo'
>  	"
>  done

...so now we're unconditionally going to SendEnv=GIT_PROTOCOL to "ssh"
invocations. I don't have an issue with this, but given the change in
the commit message this seems to have snuck under the radar. You're just
talking about always including the version in server responses, nothing
about the client always needing SendEnv=GIT_PROTOCOL now even with v0.

If the server always sends the version now, why don't you need to amend
the same t5400-send-pack.sh tests as in my "tests: mark & fix tests
broken under GIT_TEST_PROTOCOL_VERSION=1"? There's one that spews out
"version" there under my GIT_TEST_PROTOCOL_VERSION=1.

I was worried about this breaking GIT_SSH_COMMAND, but then I see due to
an interaction with picking "what ssh implementation?" we don't pass "-G
-o SendEnv=GIT_PROTOCOL" at all when I have a GIT_SSH_COMMAND, but *do*
pass it to my normal /usr/bin/ssh. Is this intended? Now if I have a
GIT_SSH_COMMAND that expects to wrap openssh I need to pass "-c
ssh.variant=ssh", because "-c ssh.variant=auto" will now omit these new
arguments.

^ permalink raw reply

* Re: Git blame performance on files with a lot of history
From: Bryan Turner @ 2018-12-14 19:10 UTC (permalink / raw)
  To: clement.moyroud; +Cc: Git Users
In-Reply-To: <CABXAcUzoNJ6s3=2xZfWYQUZ_AUefwP=5UVUgMnafKHHtufzbSA@mail.gmail.com>

On Fri, Dec 14, 2018 at 10:29 AM Clement Moyroud
<clement.moyroud@gmail.com> wrote:
>
> Hello,
>
> My group at work is migrating a CVS repo to Git. The biggest issue we
> face so far is the performance of git blame, especially compared to
> CVS on the same file. One file especially causes us trouble: it's a
> 30k lines file with 25 years of history in 3k+ commits. The complete
> repo has 200k+ commits over that same period of time.

After you converted the repository from CVS to Git, did you run a manual repack?

The process of converting a repository from another SCM often results
in poor delta chain selections which result in a repository that's
unnecessarily large on disk, and/or performs quite slowly.

Something like `git repack -Adf --depth=50 --window=200` discards the
existing delta chains and chooses new ones, and may result in
significantly improved performance. A smaller depth, like --depth=20,
might result in even more performance improvement, but may also make
the repository larger on disk; you'll need to find the balance that
works for you.

Might be something worth testing, if you haven't?

Bryan

^ permalink raw reply

* [PATCH v3] log -G: ignore binary files
From: Thomas Braun @ 2018-12-14 18:49 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, avarab
In-Reply-To: <1535679074.141165.1542834055343@ox.hosteurope.de>

The -G<regex> option of log looks for the differences whose patch text
contains added/removed lines that match regex.

Currently -G looks also into patches of binary files (which
according to [1]) is binary as well.

This has a couple of issues:

- It makes the pickaxe search slow. In a proprietary repository of the
  author with only ~5500 commits and a total .git size of ~300MB
  searching takes ~13 seconds

    $time git log -Gwave > /dev/null

    real    0m13,241s
    user    0m12,596s
    sys     0m0,644s

  whereas when we ignore binary files with this patch it takes ~4s

    $time ~/devel/git/git log -Gwave > /dev/null

    real    0m3,713s
    user    0m3,608s
    sys     0m0,105s

  which is a speedup of more than fourfold.

- The internally used algorithm for generating patch text is based on
  xdiff and its states in [1]

  > The output format of the binary patch file is proprietary
  > (and binary) and it is basically a collection of copy and insert
  > commands [..]

  which means that the current format could change once the internal
  algorithm is changed as the format is not standardized. In addition
  the git binary patch format used for preparing patches for git apply
  is *different* from the xdiff format as can be seen by comparing

  git log -p -a

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor <author@example.com>
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84..edfeb6f 100644
    --- a/data.bin
    +++ b/data.bin
    @@ -1,2 +1,4 @@
     a
     a^@a
    +a
    +a^@a

  with git log --binary

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor <author@example.com>
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84bd3aa25fa07836bb1fb73db784635e24b..edfeb6f501[..]
    GIT binary patch
    literal 12
    QcmYe~N@Pgn0zx1O01)N^ZvX%Q

    literal 6
    NcmYe~N@Pgn0ssWg0XP5v

  which seems unexpected.

To resolve these issues this patch makes -G<regex> ignore binary files
by default. Textconv filters are supported and also -a/--text for
getting the old and broken behaviour back.

The -S<block of text> option of log looks for differences that changes
the number of occurrences of the specified block of text (i.e.
addition/deletion) in a file. As we want to keep the current behaviour,
add a test to ensure it stays that way.

[1]: http://www.xmailserver.org/xdiff.html

Signed-off-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
---

Changes since v2:
 - Introduce a setup step for the new tests 
 - Really start with a clean history in the tests
 - Added more complex commit history for the tests
 - Use test_when_finished for cleanup instead of doing nothing
 - Enhanced commit message to motivate the change better
 - Added some more documentation

 Documentation/diff-options.txt |  5 +++++
 Documentation/gitdiffcore.txt  |  3 ++-
 diffcore-pickaxe.c             |  6 ++++++
 t/t4209-log-pickaxe.sh         | 35 ++++++++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 0378cd574e..b94d332f71 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -524,6 +524,8 @@ struct), and want to know the history of that block since it first
 came into being: use the feature iteratively to feed the interesting
 block in the preimage back into `-S`, and keep going until you get the
 very first version of the block.
++
+Binary files are searched as well.
 
 -G<regex>::
 	Look for differences whose patch text contains added/removed
@@ -543,6 +545,9 @@ While `git log -G"regexec\(regexp"` will show this commit, `git log
 -S"regexec\(regexp" --pickaxe-regex` will not (because the number of
 occurrences of that string did not change).
 +
+Unless `--text` is supplied patches of binary files without a textconv
+filter will be ignored.
++
 See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
 information.
 
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index c0a60f3158..c970d9fe43 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -242,7 +242,8 @@ textual diff has an added or a deleted line that matches the given
 regular expression.  This means that it will detect in-file (or what
 rename-detection considers the same file) moves, which is noise.  The
 implementation runs diff twice and greps, and this can be quite
-expensive.
+expensive.  To speed things up binary files without textconv filters
+will be ignored.
 
 When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
 that match their respective criterion are kept in the output.  When
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 69fc55ea1e..4cea086f80 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -154,6 +154,12 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
 	if (textconv_one == textconv_two && diff_unmodified_pair(p))
 		return 0;
 
+	if ((o->pickaxe_opts & DIFF_PICKAXE_KIND_G) &&
+	    !o->flags.text &&
+	    ((!textconv_one && diff_filespec_is_binary(o->repo, p->one)) ||
+	     (!textconv_two && diff_filespec_is_binary(o->repo, p->two))))
+		return 0;
+
 	mf1.size = fill_textconv(o->repo, textconv_one, p->one, &mf1.ptr);
 	mf2.size = fill_textconv(o->repo, textconv_two, p->two, &mf2.ptr);
 
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 844df760f7..5d06f5f45e 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -106,4 +106,39 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 	rm .gitattributes
 '
 
+test_expect_success 'setup log -[GS] binary & --text' '
+	git checkout --orphan GS-binary-and-text &&
+	git read-tree --empty &&
+	printf "a\na\0a\n" >data.bin &&
+	git add data.bin &&
+	git commit -m "create binary file" data.bin &&
+	printf "a\na\0a\n" >>data.bin &&
+	git commit -m "modify binary file" data.bin &&
+	git rm data.bin &&
+	git commit -m "delete binary file" data.bin &&
+	git log >full-log
+'
+
+test_expect_success 'log -G ignores binary files' '
+	git log -Ga >log &&
+	test_must_be_empty log
+'
+
+test_expect_success 'log -G looks into binary files with -a' '
+	git log -a -Ga >log &&
+	test_cmp log full-log
+'
+
+test_expect_success 'log -G looks into binary files with textconv filter' '
+	test_when_finished "rm .gitattributes" &&
+	echo "* diff=bin" >.gitattributes &&
+	git -c diff.bin.textconv=cat log -Ga >log &&
+	test_cmp log full-log
+'
+
+test_expect_success 'log -S looks into binary files' '
+	git log -Sa >log &&
+	test_cmp log full-log
+'
+
 test_done
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* Re: [PATCH 1/1] worktree refs: fix case sensitivity for 'head'
From: Jacob Keller @ 2018-12-14 18:48 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Mike Rappazzo, Stefan Beller, gitgitgadget, Git mailing list,
	Junio C Hamano
In-Reply-To: <CACsJy8COqOMEk3Wzr==1-hsuGqdgKnbNfG_c90+xpU_oS-bW6A@mail.gmail.com>

On Fri, Dec 14, 2018 at 9:47 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Fri, Dec 14, 2018 at 6:38 PM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Fri, Dec 14, 2018 at 6:22 PM Jacob Keller <jacob.keller@gmail.com> wrote:
> > >
> > > On Thu, Dec 13, 2018 at 11:38 PM Duy Nguyen <pclouds@gmail.com> wrote:
> > > > Even with a new ref storage, I'm pretty sure pseudo refs like HEAD,
> > > > FETCH_HEAD... will forever be backed by filesystem. HEAD for example
> > > > is part of the repository signature and must exist as a file. We could
> > > > also lookup pseudo refs with readdir() instead of lstat(). On
> > > > case-preserving-and-insensitive filesystems, we can reject "head" this
> > > > way. But that comes with a high cost.
> > > > --
> > > > Duy
> > >
> > > Once other refs are backed by something that doesn't depend on
> > > filesystem case sensitivity, you could enforce that we only accept
> > > call-caps HEAD as a psuedo ref, and always look up other spellings in
> > > the other refs backend, though, right?
> >
> > Hmm.. yes. I don't know off hand if we have any pseudo refs in
> > lowercase. Unlikely so yes this should work.
>
> One thing we could do _today_ without waiting for a new refs backend
> is, avoid looking up pseudo refs if the given ref name is not
> all-caps. So "head" (or hEAd) can match refs/head, refs/tags/head,
> refs/heads/head but never $GIT_DIR/HEAD. And yes I checked the code,
> pseudo refs must be all-caps.
> --
> Duy

Right, I think that's a good start, at least for these pseudo refs. It
doesn't solve the more general case of refs mismatching, but it
prevents the obvious one where case actually matters, by preventing
head from looking up as HEAD.

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH 1/1] worktree refs: fix case sensitivity for 'head'
From: Jacob Keller @ 2018-12-14 18:47 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Mike Rappazzo, Stefan Beller, gitgitgadget, Git mailing list,
	Junio C Hamano
In-Reply-To: <CACsJy8CT8K9SHnTsJT4HrxAK95yTz-x2DnNRBYKkvMyGbBZCgg@mail.gmail.com>

On Fri, Dec 14, 2018 at 9:38 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Fri, Dec 14, 2018 at 6:22 PM Jacob Keller <jacob.keller@gmail.com> wrote:
> >
> > On Thu, Dec 13, 2018 at 11:38 PM Duy Nguyen <pclouds@gmail.com> wrote:
> > > Even with a new ref storage, I'm pretty sure pseudo refs like HEAD,
> > > FETCH_HEAD... will forever be backed by filesystem. HEAD for example
> > > is part of the repository signature and must exist as a file. We could
> > > also lookup pseudo refs with readdir() instead of lstat(). On
> > > case-preserving-and-insensitive filesystems, we can reject "head" this
> > > way. But that comes with a high cost.
> > > --
> > > Duy
> >
> > Once other refs are backed by something that doesn't depend on
> > filesystem case sensitivity, you could enforce that we only accept
> > call-caps HEAD as a psuedo ref, and always look up other spellings in
> > the other refs backend, though, right?
>
> Hmm.. yes. I don't know off hand if we have any pseudo refs in
> lowercase. Unlikely so yes this should work.
>

I think even if we had lowercase pseudo refs, as long as we know which
identifiers represent pseudo refs, and we don't have two variants
which match if compared case insensitively, we shouldn't have
ambiguity, since we'd distinguish whether to check a pseudo ref spot
before we actually check the file system.

> > So, yea the actual file may not
> > be case sensitive, but we would never create refs/head anymore for any
> > reason, so there would be no ambiguity if reading the refs/head vs
> > refs/HEAD on a case insensitive file system, since refs/head would no
> > longer be a legitimate ref stored as a file if you used a different
> > refs backend.
> >
> > Basically, we'd be looking up HEAD by checking the file, but we'd stop
> > looking up head, hEAd, etc in the files, and instead use whatever
> > other refs backend for non-pseudo refs. Thus, it wouldn't matter,
> > since we'd never actually lookup the other spellings of HEAD as a
> > file. Wouldn't that solve the ambiguity, at least once a repository
> > has fully switched to some alternative refs backend for non-pseudo
> > refs? (Unless I mis-understand and refs/head could be an added pseudo
> > ref?)
>
> No I think "pseudo refs" are those outside "refs" directory only. So
> "refs/head" would be a "normal" ref.
>

Right, I was a bit confused pre-coffee and forgot why a ref was a pseudo ref.

> > Jake
>
>
>
> --
> Duy

^ permalink raw reply

* Re: [PATCH v2] log -G: Ignore binary files
From: Thomas Braun @ 2018-12-14 18:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, sbeller, avarab
In-Reply-To: <xmqq5zwgny9d.fsf@gitster-ct.c.googlers.com>

> Junio C Hamano <gitster@pobox.com> hat am 29. November 2018 um 08:22 geschrieben:
> 
> 
> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> +test_expect_success 'log -G ignores binary files' '
> >> +	git checkout --orphan orphan1 &&
> >> +	printf "a\0a" >data.bin &&
> >> +	git add data.bin &&
> >> +	git commit -m "message" &&
> >> +	git log -Ga >result &&
> >> +	test_must_be_empty result
> >> +'
> >
> > As this is the first mention of data.bin, this is adding a new file
> > data.bin that has two 'a' but is a binary file.  And that is the
> > only commit in the history leading to orphan1.
> >
> > The fact that "log -Ga" won't find any means it missed the creation
> > event, because the blob is binary.  Good.
> 
> By the way, this root commit records another file whose path is
> "file" and has "Picked<LF>" in it.  If the file had 'a' in it, it
> would have been included in "git log" output, but that is too subtle
> a point to be noticed by the readers who are only reading this patch
> without seeing what has been done to the index before this test
> piece.
> 
> If you are going to restructure these tests to create a three-commit
> history in a single expect_success that is inspected with various
> "log -Ga" invocations in subsequent tests, it is worth removing that
> other file (or rather, starting with "read-tree --empty" immediately
> after checking out the orphan branch, to clarify to the readers that
> there is nothing but what you add in the set-up step in the index)
> to make the test more robust.

Thanks for the explanation. First I though that "checkout --orphan"
already takes care of everything but "read-tree --empty" is the way to go.

Done.

^ permalink raw reply

* Re: [PATCH v2] log -G: Ignore binary files
From: Thomas Braun @ 2018-12-14 18:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, sbeller, avarab
In-Reply-To: <xmqqa7lsnyu5.fsf@gitster-ct.c.googlers.com>

> Junio C Hamano <gitster@pobox.com> hat am 29. November 2018 um 08:10 geschrieben:
> 
> 
> Thomas Braun <thomas.braun@virtuell-zuhause.de> writes:
> 
> > Subject: Re: [PATCH v2] log -G: Ignore binary files
> 
> s/Ig/ig/; (will locally munge--this alone is no reason to reroll).

Done.
 
> The code changes looked sensible.

Thanks.

> > diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> > index 844df760f7..5c3e2a16b2 100755
> > --- a/t/t4209-log-pickaxe.sh
> > +++ b/t/t4209-log-pickaxe.sh
> > @@ -106,4 +106,44 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
> >  	rm .gitattributes
> >  '
> >  
> > +test_expect_success 'log -G ignores binary files' '
> > +	git checkout --orphan orphan1 &&
> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git log -Ga >result &&
> > +	test_must_be_empty result
> > +'
> 
> As this is the first mention of data.bin, this is adding a new file
> data.bin that has two 'a' but is a binary file.  And that is the
> only commit in the history leading to orphan1.
> 
> The fact that "log -Ga" won't find any means it missed the creation
> event, because the blob is binary.  Good.
> 
> > +test_expect_success 'log -G looks into binary files with -a' '
> > +	git checkout --orphan orphan2 &&
> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> 
> This starts from the state left by the previous test piece, i.e. we
> have a binary data.bin file with two 'a' in it.  We pretend to
> modify and add, but these two steps are no-op if the previous
> succeeded, but even if the previous step failed, we get what we want
> in the data.bin file.  And then we make an initial commit the same
> way.
> 
> > +	git log -a -Ga >actual &&
> > +	git log >expected &&
> 
> And we ran the same test but this time with "-a" to tell Git that
> binary-ness should not matter.  It will find the sole commit.  Good.
> 
> > +	test_cmp actual expected
> > +'
> > +
> > +test_expect_success 'log -G looks into binary files with textconv filter' '
> > +	git checkout --orphan orphan3 &&
> > +	echo "* diff=bin" > .gitattributes &&
> 
> s/> />/; (will locally munge--this alone is no reason to reroll).

Done.

> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git -c diff.bin.textconv=cat log -Ga >actual &&
> 
> This exposes a slight iffy-ness in the design.  The textconv filter
> used here does not strip the "binary-ness" from the payload, but it
> is enough to tell the machinery that -G should look into the
> difference.  Is that really desirable, though?
> 
> IOW, if this weren't the initial commit (which is handled by the
> codepath to special-case creation and deletion in diff_grep()
> function), would "log -Ga" show it without "-a"?  Should it?

Yes "log -Ga" will find all three commits (creation, modification, deletion)
which are present in v3 without "-a" and cat as textconv filter.

I can make that more explicit with a textconv filter which removes the binary-ness

git -c diff.bin.textconv="sed -e \"s/\x00//g\"" log -Ga >log &&

(diff.bin.textconv="cat -v" works here as well but seems non-portable)

Now we could also search for "aa" as the NUL separating them is gone but that could
be getting too clever or?

> I think this test piece (and probably the previous ones for "-a" vs
> "no -a" without textconv, as well) should be using a history with
> three commits, where
> 
>     - the root commit introduces "a\0a" to data.bin (creation event)
> 
>     - the second commit adds another instance of "a\0a" to data.bin
>       (forces comparison)
> 
>     - the third commit removes data.bin (deletion event)
> 
> and make sure that the three are treated identically.  If "log -Ga"
> finds one (with the combination of other conditions like use of
> textconv or -a option), it should find all three, and vice versa.

Good point. I've added that.

> > +	git log >expected &&
> > +	test_cmp actual expected
> > +'
> > +
> > +test_expect_success 'log -S looks into binary files' '
> > +	git checkout --orphan orphan4 &&
> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git log -Sa >actual &&
> > +	git log >expected &&
> > +	test_cmp actual expected
> > +'
> 
> Likewise.  This would also benefit from a three-commit history.
> 
> Perhaps you can create such a history at the beginning of these
> additions as another "setup -G/-S binary test" step and test
> different variations in subsequent tests without the setup?

Done.

^ permalink raw reply

* Re: t5601 breakage at 3cd325f7be (Merge branch 'js/protocol-advertise-multi' into pu, 2018-12-14)
From: Ævar Arnfjörð Bjarmason @ 2018-12-14 18:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Josh Steadmon, gitster
In-Reply-To: <nycvar.QRO.7.76.6.1812141318520.43@tvgsbejvaqbjf.bet>


On Fri, Dec 14 2018, Johannes Schindelin wrote:

> Hi,
>
> this morning Travis sounded quite a few claxons:
> https://travis-ci.org/git/git/builds/467839114
>
> It seems that quite a few tests in t5601-clone.sh fail, the first of which
> reading like this:
>
> -- snip --
> expecting success:
> 	git clone myhost:src ssh-clone &&
> 	expect_ssh "-o SendEnv=GIT_PROTOCOL" myhost src
>
> ++ git clone myhost:src ssh-clone
> Cloning into 'ssh-clone'...
> ++ expect_ssh '-o SendEnv=GIT_PROTOCOL' myhost src
> ++ test_when_finished '
> 		(cd "$TRASH_DIRECTORY" && rm -f ssh-expect ssh-output.munged && >ssh-output)
> 	'
> ++ test 0 = 0
> ++ test_cleanup='{
> 		(cd "$TRASH_DIRECTORY" && rm -f ssh-expect ssh-output.munged && >ssh-output)
>
> 		} && (exit "$eval_ret"); eval_ret=$?; :'
> ++ case "$#" in
> ++ echo 'ssh: -o SendEnv=GIT_PROTOCOL myhost git-upload-pack '\''src'\'''
> ++ cd '/Users/vsts/agent/2.144.0/work/1/s/t/trash directory.t5601-clone'
> ++ sed 's/ssh: -o SendEnv=GIT_PROTOCOL /ssh: /'
> ++ mv ssh-output.munged ssh-output
> ++ test_cmp ssh-expect ssh-output
> ++ diff -u ssh-expect ssh-output
> --- ssh-expect	2018-12-14 04:30:28.000000000 +0000
> +++ ssh-output	2018-12-14 04:30:28.000000000 +0000
> @@ -1 +1 @@
> -ssh: -o SendEnv=GIT_PROTOCOL myhost git-upload-pack 'src'
> +ssh: myhost git-upload-pack 'src'
> error: last command exited with $?=1
> not ok 37 - clone myhost:src uses ssh
> #
> #		git clone myhost:src ssh-clone &&
> #		expect_ssh "-o SendEnv=GIT_PROTOCOL" myhost src
> #
> -- snap --
>
> I've bisected this down to 3cd325f7be (Merge branch
> 'js/protocol-advertise-multi' into pu, 2018-12-14), a merge, meaning that
> two topic branches do not play nice with one another.
>
> Staring at the breakage and the changes involved, I suspected that
> 391985d7c7 (tests: mark & fix tests broken under
> GIT_TEST_PROTOCOL_VERSION=1, 2018-12-13) does not play well with the
> merged 24c10f7473 (protocol: advertise multiple supported versions,
> 2018-11-16), and indeed, reverting 391985d7c7 on top of 3cd325f7be lets
> t5601 pass again.
>
> It would appear to me, then, that these two patches step on each others'
> toes. Josh, Ævar, what should be done about this?

Looking at the two the breakage is on my side, but I got away with it
before. I'm re-rolling mine for this & other fixes, and will make sure
the two play well together. Thanks.

^ permalink raw reply

* Re: [PATCH v2] log -G: Ignore binary files
From: Thomas Braun @ 2018-12-14 18:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, gitster, peff, sbeller
In-Reply-To: <87a7ltz7jh.fsf@evledraar.gmail.com>

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> hat am 28. November 2018 um 13:54 geschrieben:
> 
> 
> 
> On Wed, Nov 28 2018, Thomas Braun wrote:
> 
> Looks much better this time around.

Thanks.
 
> > The -G<regex> option of log looks for the differences whose patch text
> > contains added/removed lines that match regex.
> >
> > As the concept of patch text only makes sense for text files, we need to
> > ignore binary files when searching with -G <regex> as well.
> >
> > The -S<block of text> option of log looks for differences that changes
> > the number of occurrences of the specified block of text (i.e.
> > addition/deletion) in a file. As we want to keep the current behaviour,
> > add a test to ensure it.
> > [...]
> > diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> > index c0a60f3158..059ddd3431 100644
> > --- a/Documentation/gitdiffcore.txt
> > +++ b/Documentation/gitdiffcore.txt
> > @@ -242,7 +242,7 @@ textual diff has an added or a deleted line that matches the given
> >  regular expression.  This means that it will detect in-file (or what
> >  rename-detection considers the same file) moves, which is noise.  The
> >  implementation runs diff twice and greps, and this can be quite
> > -expensive.
> > +expensive.  Binary files without textconv filter are ignored.
> 
> Now that we support --text that should be documented. I tried to come up
> with something on top:
> 
>     diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>     index 0378cd574e..42ae65fb57 100644
>     --- a/Documentation/diff-options.txt
>     +++ b/Documentation/diff-options.txt
>     @@ -524,6 +524,10 @@ struct), and want to know the history of that block since it first
>      came into being: use the feature iteratively to feed the interesting
>      block in the preimage back into `-S`, and keep going until you get the
>      very first version of the block.
>     ++
>     +Unlike `-G` the `-S` option will always search through binary files
>     +without a textconv filter. [[TODO: Don't we want to support --no-text
>     +then as an optimization?]].
> 
>      -G<regex>::
>      	Look for differences whose patch text contains added/removed
>     @@ -545,6 +549,15 @@ occurrences of that string did not change).
>      +
>      See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
>      information.
>     ++
>     +Unless `--text` is supplied binary files without a textconv filter
>     +will be ignored.  This was not the case before Git version 2.21..
>     ++
>     +With `--text`, instead of patch lines we <some example similar to the
>     +above diff showing what we actually do for binary files. [[TODO: How
>     +does that work?. Could just link to the "diffcore-pickaxe: For
>     +Detecting Addition/Deletion of Specified String" section in
>     +gitdiffcore(7) which could explain it]]
> 
>      --find-object=<object-id>::
>      	Look for differences that change the number of occurrences of
>     diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
>     index c0a60f3158..26880b4149 100644
>     --- a/Documentation/gitdiffcore.txt
>     +++ b/Documentation/gitdiffcore.txt
>     @@ -251,6 +251,10 @@ criterion in a changeset, the entire changeset is kept.  This behavior
>      is designed to make reviewing changes in the context of the whole
>      changeset easier.
> 
>     +Both `-S' and `-G' will ignore binary files without a textconv filter
>     +by default, this can be overriden with `--text`. With `--text` the
>     +binary patch we look through is generated as [[TODO: ???]].
>     +
>      diffcore-order: For Sorting the Output Based on Filenames
>      ---------------------------------------------------------
> 
> But as you can see given the TODO comments I don't know how this works
> exactly. I *could* dig, but that's my main outstanding problem with this
> patch, the commit message / docs aren't being updated to reflect the new
> behavior.

v3 will have some more documentation which took inspiration by your sketches here.
I've not included a reference to the git version 2.21 in which that patch will hopefully
land as that seems to be not common in the documentation.

I see tweaking the behaviour of -S outside of this patch series.
 
> I.e. let's leave the docs in some state where the reader can as
> unambiguously know what to expect with -G and these binary diffs we've
> been implicitly supporting as with the textual diffs. Ideally with some
> examples of how to generate them (re my question about the base85 output
> in v1).
> 
> Part of that's obviously behavior we've had all along, but it's much
> more convincing to say:
> 
>     We are changing X which we've done for ages, it works exactly like
>     this, and here's a switch to get it back.
> 
> Instead of:
> 
>     X doesn't make sense, let's turn it off.
> 
> Also the diffcore docs already say stuff about how slow/fast things are,
> and in a side-thread you said:
> 
>     My main motiviation is to speed up "log -G" as that takes a
>     considerable amount of time when it wades through MBs of binary
>     files which change often.
> 
> Makes sense, but then let's say something about that in that section of
> the docs.

Done.

> >  When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
> >  that match their respective criterion are kept in the output.  When
> > diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> > index 69fc55ea1e..4cea086f80 100644
> > --- a/diffcore-pickaxe.c
> > +++ b/diffcore-pickaxe.c
> > @@ -154,6 +154,12 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
> >  	if (textconv_one == textconv_two && diff_unmodified_pair(p))
> >  		return 0;
> >
> > +	if ((o->pickaxe_opts & DIFF_PICKAXE_KIND_G) &&
> > +	    !o->flags.text &&
> > +	    ((!textconv_one && diff_filespec_is_binary(o->repo, p->one)) ||
> > +	     (!textconv_two && diff_filespec_is_binary(o->repo, p->two))))
> > +		return 0;
> > +
> >  	mf1.size = fill_textconv(o->repo, textconv_one, p->one, &mf1.ptr);
> >  	mf2.size = fill_textconv(o->repo, textconv_two, p->two, &mf2.ptr);
> >
> > diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> > index 844df760f7..5c3e2a16b2 100755
> > --- a/t/t4209-log-pickaxe.sh
> > +++ b/t/t4209-log-pickaxe.sh
> > @@ -106,4 +106,44 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
> >  	rm .gitattributes
> >  '
> >
> > +test_expect_success 'log -G ignores binary files' '
> > +	git checkout --orphan orphan1 &&
> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git log -Ga >result &&
> > +	test_must_be_empty result
> > +'
> > +
> > +test_expect_success 'log -G looks into binary files with -a' '
> > +	git checkout --orphan orphan2 &&
> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git log -a -Ga >actual &&
> > +	git log >expected &&
> > +	test_cmp actual expected
> > +'
> 
> A large part of the question(s) I have above & future readers would
> presumably have would be answered by these tests using more realistic
> test data. I.e. also with \n in there to see whether -G is also
> line-based in this binary case.
> 
> > +test_expect_success 'log -G looks into binary files with textconv filter' '
> > +	git checkout --orphan orphan3 &&
> > +	echo "* diff=bin" > .gitattributes &&
> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git -c diff.bin.textconv=cat log -Ga >actual &&
> > +	git log >expected &&
> > +	test_cmp actual expected
> > +'
> > +
> > +test_expect_success 'log -S looks into binary files' '
> > +	git checkout --orphan orphan4 &&
> > +	printf "a\0a" >data.bin &&
> > +	git add data.bin &&
> > +	git commit -m "message" &&
> > +	git log -Sa >actual &&
> > +	git log >expected &&
> > +	test_cmp actual expected
> > +'
> > +
> >  test_done

Done.

> These tests have way to much repeated boilerplate for no reason. This
> could just be (as-is, without the better test data suggested above):
> 
> diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> index 844df760f7..23ed6cc4b1 100755
> --- a/t/t4209-log-pickaxe.sh
> +++ b/t/t4209-log-pickaxe.sh
> @@ -106,4 +106,34 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
>  	rm .gitattributes
>  '
> 
> +test_expect_success 'setup log -[GS] binary & --text' '
> +	git checkout --orphan GS-binary-and-text &&
> +	printf "a\0a" >data.bin &&
> +	git add data.bin &&
> +	git commit -m "message" &&
> +	git log >full-log
> +'
> +
> +test_expect_success 'log -G ignores binary files' '
> +	git log -Ga >result &&
> +	test_must_be_empty result
> +'
> +
> +test_expect_success 'log -G looks into binary files with -a' '
> +	git log -a -Ga >actual &&
> +	test_cmp actual full-log
> +'
> +
> +test_expect_success 'log -G looks into binary files with textconv filter' '
> +	echo "* diff=bin" >.gitattributes &&
> +	git -c diff.bin.textconv=cat log -Ga >actual &&
> +	test_cmp actual full-log
> +'
> +
> +test_expect_success 'log -S looks into binary files' '
> +	>.gitattributes &&
> +	git log -Sa >actual &&
> +	test_cmp actual full-log
> +'
> +
>  test_done

Thanks for pointer. This is resolved in v3 as well. I'm not used to test cases which
depend on each other but your are totally right.

Thanks for the review.

^ permalink raw reply

* Git blame performance on files with a lot of history
From: Clement Moyroud @ 2018-12-14 18:29 UTC (permalink / raw)
  To: git

Hello,

My group at work is migrating a CVS repo to Git. The biggest issue we
face so far is the performance of git blame, especially compared to
CVS on the same file. One file especially causes us trouble: it's a
30k lines file with 25 years of history in 3k+ commits. The complete
repo has 200k+ commits over that same period of time.

Currently, 'cvs annotate' takes 2.7 seconds, while 'git blame'
(without -M nor -C) takes 145s.

I tried using the commit-graph with the Bloom filter, per
https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/.
No dice:
    > time GIT_TEST_BLOOM_FILTERS=1
/wv/cmoyroud/calibre-src/git-bloom-filters/git-bloom-bin/bin/git
commit-graph write --reachable
    Annotating commits in commit graph: 573705, done.
    Computing commit graph generation numbers: 100% (286441/286441), done.
    Computing commit diff Bloom filters: 100% (286441/286441), done.
    GIT_TEST_BLOOM_FILTERS=1  commit-graph write --reachable  386.80s
user 31.78s system 78% cpu 8:53.87 total
    > time GIT_TEST_BLOOM_FILTERS=1 GIT_TRACE_BLOOM_FILTER=2
GIT_USE_POC_BLOOM_FILTER=y /path/to/git blame master --
important/file.C > /tmp/foo.compiler.bloom
    Blaming lines: 100% (33179/33179), done.
    GIT_TEST_BLOOM_FILTERS=1 GIT_TRACE_BLOOM_FILTER=2
GIT_USE_POC_BLOOM_FILTER=y   145.11s user 0.97s system 99% cpu 2:26.22
total
    > time /path/to/git blame master -- important/file.C >
/tmp/foo.compiler.nobloom
    Blaming lines: 100% (33179/33179), done.
    GIT_TEST_BLOOM_FILTERS=1 GIT_TEST_BLOOM_FILTERS=1
GIT_USE_POC_BLOOM_FILTER=y   141.69s user 0.77s system 99% cpu 2:22.56
total

I used Derrick Stolee's tree at
https://github.com/derrickstolee/git/tree/bloom/stolee

Looking at the blame code, it does not seem to be able to use the
commit graph, so I tried the same rev-list command from the e-mail,
using my own file:
    > GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y
/path/to/git rev-list --count --full-history HEAD -- important/file.C
    3576

No trace information there either. Running 'strings' on the binary
reports the env. variable names, so I'm not totally crazy. Let me know
if I tried the right thing :)

Looks like blame performance is gonna be the biggest issue for us, so
I'm really interested in seeing improvements there. Let me know if
there's anything else I can try.

Cheers,

Clément

^ permalink raw reply

* Re: [PATCH 1/1] worktree refs: fix case sensitivity for 'head'
From: Duy Nguyen @ 2018-12-14 17:46 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Mike Rappazzo, Stefan Beller, gitgitgadget, Git Mailing List,
	Junio C Hamano
In-Reply-To: <CACsJy8CT8K9SHnTsJT4HrxAK95yTz-x2DnNRBYKkvMyGbBZCgg@mail.gmail.com>

On Fri, Dec 14, 2018 at 6:38 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Fri, Dec 14, 2018 at 6:22 PM Jacob Keller <jacob.keller@gmail.com> wrote:
> >
> > On Thu, Dec 13, 2018 at 11:38 PM Duy Nguyen <pclouds@gmail.com> wrote:
> > > Even with a new ref storage, I'm pretty sure pseudo refs like HEAD,
> > > FETCH_HEAD... will forever be backed by filesystem. HEAD for example
> > > is part of the repository signature and must exist as a file. We could
> > > also lookup pseudo refs with readdir() instead of lstat(). On
> > > case-preserving-and-insensitive filesystems, we can reject "head" this
> > > way. But that comes with a high cost.
> > > --
> > > Duy
> >
> > Once other refs are backed by something that doesn't depend on
> > filesystem case sensitivity, you could enforce that we only accept
> > call-caps HEAD as a psuedo ref, and always look up other spellings in
> > the other refs backend, though, right?
>
> Hmm.. yes. I don't know off hand if we have any pseudo refs in
> lowercase. Unlikely so yes this should work.

One thing we could do _today_ without waiting for a new refs backend
is, avoid looking up pseudo refs if the given ref name is not
all-caps. So "head" (or hEAd) can match refs/head, refs/tags/head,
refs/heads/head but never $GIT_DIR/HEAD. And yes I checked the code,
pseudo refs must be all-caps.
-- 
Duy

^ permalink raw reply

* Re: [PATCH 1/1] worktree refs: fix case sensitivity for 'head'
From: Duy Nguyen @ 2018-12-14 17:38 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Mike Rappazzo, Stefan Beller, gitgitgadget, Git Mailing List,
	Junio C Hamano
In-Reply-To: <CA+P7+xoxE0o=5fMQrDoyCgWMQ-By2t1LdApecRDWmoXXCfnFuw@mail.gmail.com>

On Fri, Dec 14, 2018 at 6:22 PM Jacob Keller <jacob.keller@gmail.com> wrote:
>
> On Thu, Dec 13, 2018 at 11:38 PM Duy Nguyen <pclouds@gmail.com> wrote:
> > Even with a new ref storage, I'm pretty sure pseudo refs like HEAD,
> > FETCH_HEAD... will forever be backed by filesystem. HEAD for example
> > is part of the repository signature and must exist as a file. We could
> > also lookup pseudo refs with readdir() instead of lstat(). On
> > case-preserving-and-insensitive filesystems, we can reject "head" this
> > way. But that comes with a high cost.
> > --
> > Duy
>
> Once other refs are backed by something that doesn't depend on
> filesystem case sensitivity, you could enforce that we only accept
> call-caps HEAD as a psuedo ref, and always look up other spellings in
> the other refs backend, though, right?

Hmm.. yes. I don't know off hand if we have any pseudo refs in
lowercase. Unlikely so yes this should work.

> So, yea the actual file may not
> be case sensitive, but we would never create refs/head anymore for any
> reason, so there would be no ambiguity if reading the refs/head vs
> refs/HEAD on a case insensitive file system, since refs/head would no
> longer be a legitimate ref stored as a file if you used a different
> refs backend.
>
> Basically, we'd be looking up HEAD by checking the file, but we'd stop
> looking up head, hEAd, etc in the files, and instead use whatever
> other refs backend for non-pseudo refs. Thus, it wouldn't matter,
> since we'd never actually lookup the other spellings of HEAD as a
> file. Wouldn't that solve the ambiguity, at least once a repository
> has fully switched to some alternative refs backend for non-pseudo
> refs? (Unless I mis-understand and refs/head could be an added pseudo
> ref?)

No I think "pseudo refs" are those outside "refs" directory only. So
"refs/head" would be a "normal" ref.

> Jake



-- 
Duy

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox