Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] gitweb/Makefile: Remove static/gitweb.js in the clean target
From: Junio C Hamano @ 2011-10-27 17:49 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Jakub Narebski, Drew Northup, Junio C Hamano, GIT Mailing-list
In-Reply-To: <4EA87C06.8080808@ramsay1.demon.co.uk>

Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:

>> gitweb.js is nowadays a generated file.  Though that bit should be
>> in commit message...
>
> Yep, will do ...

Thanks; here is what I already queued.

-- >8 --
From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Date: Tue, 25 Oct 2011 18:15:20 +0100
Subject: [PATCH] gitweb/Makefile: Remove static/gitweb.js in the clean target

Since 9a86dd5 (gitweb: Split JavaScript for maintability, combining on
build, 2011-04-28), static/gitweb.js has been a build product that should
be cleaned upon "make clean".

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 gitweb/Makefile |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index 5d20515..c360284 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -183,7 +183,9 @@ install: all
 ### Cleaning rules
 
 clean:
-	$(RM) gitweb.cgi static/gitweb.min.js static/gitweb.min.css GITWEB-BUILD-OPTIONS
+	$(RM) gitweb.cgi static/gitweb.js \
+		static/gitweb.min.js static/gitweb.min.css \
+		GITWEB-BUILD-OPTIONS
 
 .PHONY: all clean install test test-installed .FORCE-GIT-VERSION-FILE FORCE
 
-- 
1.7.7.1.552.g2c3d8

^ permalink raw reply related

* Re: [PATCH] git grep: be careful to use mutices only when they are initialized
From: Jeff King @ 2011-10-27 18:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, msysgit, git
In-Reply-To: <7v39ef34in.fsf@alter.siamese.dyndns.org>

On Wed, Oct 26, 2011 at 01:02:40PM -0700, Junio C Hamano wrote:

>  - Could we lose "#ifndef NO_PTHREADS" inside grep_sha1(), grep_file(),
>    and possibly cmd_grep() functions and let the compiler optimize things
>    away under NO_PTHREADS compilation?

I don't think so. If NO_PTHREADS is set, we might not have pthread
functions at all. Sure, many compilers will optimize:

  if (0)
          pthread_mutex_lock(...);

to remove the call completely. But would a compiler be wrong to complain
that pthread_mutex_lock is not defined, or to include reference to it
for the linker? gcc, both with and without optimizations, will complain
about:

  echo 'int main() { if (0) does_not_exist(); return 0; }' >foo.c
  gcc -Wall -c foo.c

though it does actually remove the dead code and link properly. I
wouldn't be surprised if some other compilers don't work, though (and of
course the warning is ugly).

I think you would have to do something like this in thread-utils.h:

  #ifndef NO_PTHREADS
  #include <pthread.h>
  #else
  #define pthread_mutex_t int
  #define pthread_mutex_init(m, a) do {} while(0)
  #define pthread_mutex_lock(m) do {} while(0)
  #define pthread_mutex_unlock(m) do {} while (0)
  /* and so forth for every pthread function */
  #endif

-Peff

^ permalink raw reply

* Re: [PATCH/WIP 01/11] Introduce "check-attr --excluded" as a replacement for "add --ignore-missing"
From: Junio C Hamano @ 2011-10-27 18:08 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1319438176-7304-2-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> --ignore-missing is used by submodule to check if a path may be
> ignored by .gitignore files. It does not really fit in git-add (git
> add takes pathspec, but --ignore-missing takes only paths)
>
> Google reckons that --ignore-missing is not used anywhere but
> git-submodule.sh. Remove --ignore-missing and introduce "check-attr
> --excluded" as a replacement.

Hmm. "add --ignore-missing" somehow does not fit very well with other uses
of the option of the same name. In all other contexts, "ignore-missing"
means just that: ignore the fact that whatever _thing_ we were made to
expect to exist by the instruction from the user does not exist, which
usually results in an error or a report. "add --ignore-missing" does not
seem to be that (for one thing, it requires --dry-run).

It is unclear to me what is supposed to do to after reading the three-line
documentation in the manpage X-<.

So I am perfectly OK with the removal in the current form.

But I do not think "is this path ignored with the .gitignore rules" check
belongs to check-attr, either.

The pattern of having .scmignore files to list ignored paths were forced
upon us by historical version control systems, in the name of "the users
expect it". If there weren't such constraints, it would have been far
nicer---we could have just said "if you want to ignore paths, just use the
attributes mechanism and give them the 'ignored' attribute" without having
to have the exclude mechanism.

But we do not live in that ideal world.

Perhaps ls-files is a more suitable home for the feature?

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-27 18:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7vvcrd411x.fsf@alter.siamese.dyndns.org>

On Tue, Oct 25, 2011 at 07:07:38AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I mean, besides the obvious that UTF-16 is ...
> 
> Yes, you could, besides the obvious. But that obvious reason makes it
> sufficiently different that it may not be so outrageous to draw the line
> between it and all the others.

Yeah, and I'm OK with that. It's just not a satisfying answer to give
Windows people who think UTF-16 is a good idea. But at the very least,
it's still unicode. It should be lossless for them to convert to utf8
and back if they want.

Speaking of which, I've been looking at handling diffing of utf-16
files. Right now we generally just consider them binary, which sucks.
It's easy to identify them by BOM in the is_buffer_binary() code, but
that's only part of it. We do an OK job of diffing them, except that:

  1. The BOM makes some diffs a little noisier.

  2. We split lines on 0x0a. But this byte can appear in other code
     points, like 0x010a (Ċ), or the entire entire 0x0a* code point (the
     entire Gurmukhi charset).

I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
to utf8, and then display them in utf8. Is that too gross for us to
consider?

You can kind-of implement this outside of git using textconv. But you
have to manually mark each file as utf-16, as there's no way to trigger
an alternative diff driver on something like a BOM.

I'm really not clear on how people with utf-16 files work. Even if we
did treat utf-16 like text, the _rest_ of git is outputting ascii, so
it's not like their terminals are utf-16. But we do have projects on
github with utf-16 and utf-32 encodings.

-Peff

^ permalink raw reply

* Re: [PATCH] Fix 'Cloning into' message
From: Richard Hartmann @ 2011-10-27 18:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vvcraz64i.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 19:39, Junio C Hamano <gitster@pobox.com> wrote:


> We try to be consistent and many places do quote user supplied paths in
> pair of single quotes in human readable messages, and this is in line with
> that pattern.

Exactly. That is why I chose single quotes. Sorry for not including
the signed-off the first time.


>    $ git clone foo "joey's foo"

In that case, the user has other problems ;)


Richard

^ permalink raw reply

* Re: [PATCH/WIP 05/11] symbolize return values of tree_entry_interesting()
From: Junio C Hamano @ 2011-10-27 18:36 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1319438176-7304-6-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This helps extending the value later on for "interesting, but cannot
> decide if the entry truely matches yet" (ie. prefix matches)
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Good change; it is a basic code hygiene to avoid magic constants anyway.

> diff --git a/tree-diff.c b/tree-diff.c
> index 6782484..25cc981 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -114,12 +114,13 @@ static void show_entry(struct diff_options *opt, const char *prefix,
>  }
>  
>  static void skip_uninteresting(struct tree_desc *t, struct strbuf *base,
> -			       struct diff_options *opt, int *match)
> +			       struct diff_options *opt,
> +			       enum interesting *match)
>  {
>  	while (t->size) {
>  		*match = tree_entry_interesting(&t->entry, base, 0, &opt->pathspec);
>  		if (*match) {
> -			if (*match < 0)
> +			if (*match == all_entries_not_interesting)
>  				t->size = 0;
>  			break;
>  		}

The caller of this function needs to be updated as well.

But I have to wonder why this skip_uninteresting() does not peek the
original value of *match and skip, which is the loop structure the other
caller of tree_entry_interesting() in this file has.

diff --git a/tree-diff.c b/tree-diff.c
index 25cc981..de4ba28 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -133,7 +133,7 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 {
 	struct strbuf base;
 	int baselen = strlen(base_str);
-	int t1_match = 0, t2_match = 0;
+	enum interesting t1_match, t2_match;
 
 	/* Enable recursion indefinitely */
 	opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
@@ -142,6 +142,9 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 	strbuf_init(&base, PATH_MAX);
 	strbuf_add(&base, base_str, baselen);
 
+	/* Initialize to something other than all_entries_not_interesting */
+	t1_match = t2_match = entry_not_interesting;
+
 	for (;;) {
 		if (diff_can_quit_early(opt))
 			break;

^ permalink raw reply related

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-27 18:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027181303.GF1967@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
> to utf8, and then display them in utf8. Is that too gross for us to
> consider?

I tend to think so; it is entirely a different matter if the user
instructed us to clean/smudge UTF-16 payload into/outof UTF-8.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-27 18:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v7h3qz2yo.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 11:47:27AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
> > to utf8, and then display them in utf8. Is that too gross for us to
> > consider?
> 
> I tend to think so; it is entirely a different matter if the user
> instructed us to clean/smudge UTF-16 payload into/outof UTF-8.

Minor nit, but this is just for diff, so it is not about clean/smudge
but rather about doing something like textconv.

The other option I mentioned would be something like detecting the BOM
and pretending as if the attribute "diff=utf-16" was set (which would do
nothing by default). Then people could set themselves up to handle
utf-16 if they wanted, but wouldn't have to go around marking each file
with .gitattributes.

But maybe that is too gross, too, and they should just use
.gitattributes.

-Peff

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-27 19:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027185220.GA26621@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Oct 27, 2011 at 11:47:27AM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
>> > to utf8, and then display them in utf8. Is that too gross for us to
>> > consider?
>> 
>> I tend to think so; it is entirely a different matter if the user
>> instructed us to clean/smudge UTF-16 payload into/outof UTF-8.
>
> Minor nit, but this is just for diff, so it is not about clean/smudge
> but rather about doing something like textconv.

I can understand if some tools in the Windows land prefer to work with
these encodings, so clean/smudge to have the checkout in these encodings
would be a reasonable thing not just diff but things like grep. On the
other hand, I do doubt the sanity of these people if they want to have
in-repository representation also in these encodings.

So I do not think "it is just for diff" is any improvement.

^ permalink raw reply

* Repository data loss in fast-export with a merge of a deleted submodule
From: Joshua Jensen @ 2011-10-27 19:27 UTC (permalink / raw)
  To: git@vger.kernel.org

Hello.

We had a submodule that we deleted and then added back into the 
repository at the same location as the former submodule.  When running 
fast-export, the newly 'added' files for the merge commit are listed and 
then are followed with a:

M ... path/to/submodule/file
D path/to/submodule

On fast-import, the resultant repository becomes corrupt due to the 
Delete instruction above occurring AFTER the file adds/modifications.  
The new repository does not match the old repository where the 
fast-export was performed.

I have included a repro script below.  I have not been able to test this 
on Git 1.7.7.1, but I have tested on Git 1.7.7 (msysGit version).

Please compare the differences between the generated main.fe and 
newmain.fe files.  newmain.fe has data loss.

I am not familiar with the fast-export code.  Can anyone help out?

Thanks.

Josh

---------

rm -rf main brokenmain sub main.fenewmain.fe

# Create the submodule.
mkdir sub
cd sub
git init
echo file > file
git add file
git commit -m file
cd ..

# Create the main repository.
mkdir main
cd main
git init

# Add the submodule.
git submodule add ../sub sub
git commit -m "Add submodule"

# Remove the submodule.
rm -rf sub
git rm sub .gitmodules
git commit -m "Remove submodule"

# Add sub/file to the master branch.
mkdir sub
echo file > sub/file
git add sub/file
git commit -m "Add sub/file"
if [ -f sub/file ]; then
     echo "main: master branch: sub/file exists."
fi

# Delete the submodule directory manually, because we know that the 
incoming merge will need it gone.
git checkout -B will-be-broken HEAD^^
rm -rf sub
git merge --no-ff master

# sub/file exists within the 'will-be-broken' branch.
if [ -f sub/file ]; then
     echo "main: will-be-broken branch: sub/file exists."
fi

# Export out the main repository.
git fast-export --all > ../main.fe

# Create the brokenmain repository.
cd ..
mkdir brokenmain
cd brokenmain
git init

# Import in everything from the main repository.
git fast-import < ../main.fe

# sub/file exists within the master branch.
git checkout master
if [ -f sub/file ]; then
     echo "brokenmain: master branch: sub/file exists."
fi

# sub/file SHOULD exist within the 'will-be-broken' branch but doesn't.
git checkout will-be-broken
if [ ! -f sub/file ]; then
     echo "brokenmain: will-be-broken branch: sub/file SHOULD exist but 
doesn't."
fi

# Export out the brokenmain repository.
git fast-export --all > ../brokenmain.fe

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 21:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <1310084657-16790-3-git-send-email-gitster@pobox.com>

On Fri, Jul 8, 2011 at 02:24, Junio C Hamano <gitster@pobox.com> wrote:

>  - Commits at the tip of tags are written together, in the hope that
>   revision traversal done in incremental fetch (which starts by
>   putting them in a revision queue marked as UNINTERESTING) will see a
>   better locality of these objects;

We recently upgraded to 1.7.7 and I've traced a very large slowdown in
packing down to this commit.

On our repository packing used to take around 30 seconds, it now takes
about 4 minutes. Which means that cloning the repository went from
being slightly slow to pretty intolerable.

I haven't dug into why this is but I'm pretty sure it's because this
patch makes Git behave pathologically on repositories with a large
amount of tags. git.git itself has ~27k revs / and ~450 tags, or a tag
every ~60 revisions.

Try it on e.g. a repository with a couple of hundred thousand revs and
a tag every 10-20 revisions.

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6Ss4jLtdrDhLUNKCUjEHjHHKzfv-LMkOyTPDhRUXm4sQ@mail.gmail.com>

On Thu, Oct 27, 2011 at 23:01, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> I haven't dug into why this is but I'm pretty sure it's because this
> patch makes Git behave pathologically on repositories with a large
> amount of tags. git.git itself has ~27k revs / and ~450 tags, or a tag
> every ~60 revisions.
>
> Try it on e.g. a repository with a couple of hundred thousand revs and
> a tag every 10-20 revisions.

Actually it just seems slow in general, not just on repositories with
a lot of tags, on linux-2.6.git with this patch:

    $ time ~/g/git/git-pack-objects --keep-true-parents
--honor-pack-keep --non-empty --all --reflog
--delta-base-offs</dev/null
/home/avar/g/linux-2.6/.git/objects/pack/.tmp-pack
    Counting objects: 2184059, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (338780/338780), done.
    130605d5b82492a38452b9bc7bddfb4a6ef0e942
    Writing objects: 100% (2184059/2184059), done.
    Total 2184059 (delta 1825129), reused 2184059 (delta 1825129)

    real    1m28.426s
    user    1m19.661s
    sys     0m3.008s

With it reverted:

    $ time ~/g/git/git-pack-objects --keep-true-parents
--honor-pack-keep --non-empty --all --reflog --delta-base-offset
</dev/null /home/avar/g/linux-2.6/.git/objects/pack/.tmp-pack
    Counting objects: 2184059, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (338780/338780), done.
    130605d5b82492a38452b9bc7bddfb4a6ef0e942
    Writing objects: 100% (2184059/2184059), done.
    Total 2184059 (delta 1825129), reused 2184059 (delta 1825129)

    real    0m50.152s
    user    0m45.367s
    sys     0m2.920s

And if you create a lot of tags:

    git log --pretty=%h | perl -lne 'chomp(my $sha = <>); print $sha
if $i++ % 10 == 0' | parallel "git tag test-tag-{} {}"

Resulting in:

    $ git rev-list HEAD | wc -l
    269514
    $ git tag -l | wc -l
    13733

This'll be the time with this patch reverted, i.e. almost exactly the
same as with just ~250 tags:

    real    0m52.860s
    user    0m44.907s
    sys     0m3.172s

And with it applied, i.e. almost exactly the same as with just ~250
tags:

    real    1m29.080s
    user    1m21.825s
    sys     0m3.096s

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 22:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6ZWOF=j-k8o-4NHmjS2HpyS+PmKjJh_QKevWurBf9pbA@mail.gmail.com>

On Thu, Oct 27, 2011 at 23:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> Actually it just seems slow in general, not just on repositories with
> a lot of tags, on linux-2.6.git with this patch:

Here's profiling with gprof for everything with >1% of execution time
with the patch applied:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls   s/call   s/call  name
     21.07     15.99    15.99  2184059     0.00     0.00
add_descendants_to_write_order
     20.25     31.35    15.37 1146371554     0.00     0.00  add_to_write_order
     11.94     40.41     9.06 142180385     0.00     0.00  hashcmp
      5.55     44.62     4.21 90592818     0.00     0.00  lookup_object
      4.64     48.14     3.52 72804470     0.00     0.00  hashcmp
      3.87     51.08     2.94 90007452     0.00     0.00  get_mode
      3.31     53.59     2.51 90007452     0.00     0.00  decode_tree_entry
      1.90     55.03     1.44  2184059     0.00     0.00
add_family_to_write_order
      1.79     56.39     1.36 43247856     0.00     0.00  hashcmp
      1.29     57.37     0.98                             pack_offset_sort
      1.27     58.33     0.96 90007452     0.00     0.00  update_tree_entry
      1.27     59.29     0.96 90592817     0.00     0.00  hashtable_index
      1.20     60.20     0.91  4009188     0.00     0.00  find_pack_revindex
      1.19     61.10     0.90  5899321     0.00     0.00  find_pack_entry_one
      1.12     61.95     0.85   269514     0.00     0.00
commit_list_insert_by_date
      1.08     62.77     0.82  5387773     0.00     0.00  patch_delta

And without:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls   s/call   s/call  name
     21.29      9.13     9.13 142180385     0.00     0.00  hashcmp
     10.59     13.67     4.54 90592818     0.00     0.00  lookup_object
      8.48     17.31     3.64 72638478     0.00     0.00  hashcmp
      6.60     20.14     2.83 90007452     0.00     0.00  decode_tree_entry
      6.15     22.77     2.64 90007452     0.00     0.00  get_mode
      2.99     24.05     1.28 43247182     0.00     0.00  hashcmp
      2.96     25.32     1.27 90592817     0.00     0.00  hashtable_index
      2.47     26.38     1.06 90007452     0.00     0.00  update_tree_entry
      2.26     27.35     0.97  4009188     0.00     0.00  find_pack_revindex
      2.05     28.23     0.88   269245     0.00     0.00  process_tree
      1.96     29.07     0.84   269514     0.00     0.00
commit_list_insert_by_date
      1.94     29.90     0.83                             pack_offset_sort
      1.73     30.64     0.74  5389900     0.00     0.00  patch_delta
      1.70     31.37     0.73  5885588     0.00     0.00  find_pack_entry_one
      1.38     31.96     0.59  8692967     0.00     0.00  hashcmp
      1.24     32.49     0.53  8175096     0.00     0.00
unpack_object_header_buffer
      1.14     32.98     0.49        1     0.49     0.59  write_idx_file
      1.12     33.46     0.48  5885588     0.00     0.00
nth_packed_object_offset
      1.12     33.94     0.48  6051632     0.00     0.00
locate_object_entry_hash

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Junio C Hamano @ 2011-10-27 22:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6Ss4jLtdrDhLUNKCUjEHjHHKzfv-LMkOyTPDhRUXm4sQ@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We recently upgraded to 1.7.7 and I've traced a very large slowdown in
> packing down to this commit.

Does Dan McGee's series leading to 38d4deb (pack-objects: don't traverse
objects unnecessarily, 2011-10-18) help?

^ permalink raw reply

* Re: [PATCH 4/4] pack-objects: rewrite add_descendants_to_write_order() iteratively
From: Junio C Hamano @ 2011-10-27 22:13 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-4-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> This removes the need to call this function recursively, shinking the
> code size slightly and netting a small performance increase.
>
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>

Tricky.

As long as this is done after compute_write_order() populates
delta_sibling vs delta link in a consistent way, the new logic should
produce exactly the same result as the original code.

Thanks.

^ permalink raw reply

* git alias and --help
From: Gelonida N @ 2011-10-27 22:20 UTC (permalink / raw)
  To: git

I'm having a tiny question about git aliases.

in ~/.gitconfig I defined an alias

b = branch



This works well execp if I type


git b --help

Now I get the output
`git b' is aliased to `branch'


which is nice to know however what I reall wanted is to get the helptext
for the branch command.


Is there any trick to get the help text of git branch without
having to type

git branch --help

^ permalink raw reply

* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Junio C Hamano @ 2011-10-27 22:26 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-3-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> Two optimizations take place here- we can start our objects array
> iteration from a known point where we left off before we started trying
> to find our tags,

This I would understand (but I am somewhat curious how much last_untagged
would advance relative to nr_objects for this half of the optimization to
be worth it), but...

> and we don't need to do the deep dives required by
> add_family_to_write_order() if the object has already been marked as
> filled.

I am not sure if this produces the identical result that was benchmarked
in the original series.

For example, if you have a tagged object that is not a commit (say a
blob), you would have written that blob in the second phase (write tagged
objects together), so the family of blobs that share same delta parent as
that blob will not be written in this "Finally all the rest" in the right
place in the original list, no?

I do not think this change would forget to fill an object that needs to be
filled, but it would affect the resulting ordering of the list, so...

> @@ -560,8 +561,13 @@ static struct object_entry **compute_write_order(void)
>  	/*
>  	 * Finally all the rest in really tight order
>  	 */
> -	for (i = 0; i < nr_objects; i++)
> -		add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	for (i = last_untagged; i < nr_objects; i++) {
> +		if (!objects[i].filled)
> +			add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	}
> +
> +	if(wo_end != nr_objects)
> +		die("ordered %u objects, expected %u", wo_end, nr_objects);

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:28 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <j8clg9$ldh$1@dough.gmane.org>

Gelonida N <gelonida@gmail.com> writes:

> Is there any trick to get the help text of git branch without
> having to type
>
> git branch --help

How about "git help branch"?

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Jakub Narebski @ 2011-10-27 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Shawn O. Pearce
In-Reply-To: <CACBZZX7tghoHhxCygEj9DZSxvKyTvybawVA2HwHBkjBaH73Ujg@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Oct 27, 2011 at 23:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> > Actually it just seems slow in general, not just on repositories with
> > a lot of tags, on linux-2.6.git with this patch:
> 
> Here's profiling with gprof for everything with >1% of execution time
> with the patch applied:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.07     15.99    15.99  2184059     0.00     0.00  add_descendants_to_write_order
>      20.25     31.35    15.37 1146371554     0.00     0.00  add_to_write_order
[...]
> 
> And without:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.29      9.13     9.13 142180385     0.00     0.00  hashcmp
>      10.59     13.67     4.54 90592818     0.00     0.00  lookup_object
[...]

Errr... do or do not gprof results include time spend in libraries?
Though that might not matter for this case.

Can you repeat profiling using "perf events" or something using it
(e.g. via PAPI library like HPCToolkit)?

-- 
Jakub Narębski

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:50 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <7vfwiexe6m.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Gelonida N <gelonida@gmail.com> writes:
>
>> Is there any trick to get the help text of git branch without
>> having to type
>>
>> git branch --help
>
> How about "git help branch"?

It was bad of me to write a tongue-in-cheek answer, get distracted and
ended up sending it without the real answer.

The reason why we do not do what you seem to be suggesting is because
giving the same behaviour to "git b --help" as "git branch --help" is
wrong.

To see why, imagine you have configured an alias that is not a simple and
stupid substitution "b == branch", but something like "bt == branch -t",
and then want to know what you should write after "git bt".  Giving the
manpage for branch without giving them any hint that they configured that
alias to produce customized behaviour that is different from plain vanilla
"branch" is not quite acceptable.

I think you _could_ make a patch that special cases a simple and straight
substitution and skip the "foo is aliased to bar" step, but I doubt it is
worth it to lose consistency between "git b --help" vs "git bt --help"
that way.

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Atsushi Nakagawa @ 2011-10-27 23:00 UTC (permalink / raw)
  To: kusmabite; +Cc: Kyle Moffett, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <CABPQNSY9LdqOK=1nN_61ZRMv-ieXzSDYgNsQe0w21RAOw_D7yA@mail.gmail.com>

Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> >>> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
> >>>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> >>>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
> >>>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
> >>>>>> +{
> >>>>>> + [snip]
> >>>>>
> >>>>> The double-checked locking idiom. Very suspicious. Can you explain why it
> >>> [snip]
> >>>
> >>>  if (mutex->autoinit) {
> >>>
> >>> Assume two threads enter this block.
> >>>
> >>>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
> >>>
> >>> Only one thread, A, say on CPU A, will enter this block.
> >>>
> >>>        InitializeCriticalSection(&mutex->cs);
> >>>
> >>> Thread A writes some values. Note that there are no memory barriers
> >>> involved here. Not that I know of or that they would be documented.
> >>>
> >>>        mutex->autoinit = 0;
> >>>
> >>> And it writes another one. Thread A continues below to contend for the
> >>> mutex it just initialized.
> >>>
> >>>     } else
> >>>
> >>> Meanwhile, thread B, say on CPU B, spins in this loop:
> >>>
> >>>        while (mutex->autoinit != 0)
> >>>           ; /* wait for other thread */
> >>>
> >>> When thread B arrives here, it sees the value of autoinit that thread A
> >>> has written above.
> >>>
> >>> [snip]
> >>>
> >>
> >> Thanks for pointing this out, I completely forgot about write re-ordering.
> >>
> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
> >> InterlockedExchange generates a full memory barrier:
> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
> >
> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
> >
> > A write barrier in one thread is only effective if it is paired with a
> > read barrier in the other thread.
> >
> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
> > you don't have any guaranteed ordering.

Out of curiosity, where could re-ordering be a problem here?  I'm
thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
of "mutex->cs" not being propagated to the waiting thread.  However,
shouldn't that be a non-problem, as far as compiler reordering goes,
because it's an external function call and only the address of mutex->cs
is passed?

The only other cause I could think of is if ordering at the CPU was
somehow different (it could be if there're no special provisions for
calling external functions) or if "InterlockedExchange(&mutex->autoinit,
0)" wasn't atomic in updating autoinit and doing the memory barrier.

Either way, I couldn't vouch for the safety of the above logic without
a memory barrier so this question is purely of an academical nature. :)

> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
> 
> OK, so I should probably do something like this instead?
> 
> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
> 	; /* wait for other thread */

Technically, assuming only the updating of "mutex->cs" is in question,
the ICE should only be required once after exiting the loop...

There's a question of the propagation of the value of "mutex->autoinit"
itself, but my take is that the memory barrier on the writing thread
will push out the updated value across all CPUs, thus preventing an
infinite loop.  The other factors, value caching and loop optimization
by the compiler, should be prevented by the "volatile" keyword even with
gcc or MSVC 2003.

> I really appreciate getting some extra eyes on this, thanks.
> Concurrent programming is not my strong-suit (as this exercise has
> shown) ;)

So would I. :)

-- 
Atsushi Nakagawa
<atnak@chejz.com>
Changes are made when there is inconvenience.

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Kyle Moffett @ 2011-10-27 23:20 UTC (permalink / raw)
  To: Atsushi Nakagawa
  Cc: kusmabite, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <20111028080033.57FC.B013761@chejz.com>

On Thu, Oct 27, 2011 at 19:00, Atsushi Nakagawa <atnak@chejz.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
>> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> >> Thanks for pointing this out, I completely forgot about write re-ordering.
>> >>
>> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
>> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
>> >> InterlockedExchange generates a full memory barrier:
>> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
>> >
>> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
>> >
>> > A write barrier in one thread is only effective if it is paired with a
>> > read barrier in the other thread.
>> >
>> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
>> > you don't have any guaranteed ordering.
>
> Out of curiosity, where could re-ordering be a problem here?  I'm
> thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
> of "mutex->cs" not being propagated to the waiting thread.  However,
> shouldn't that be a non-problem, as far as compiler reordering goes,
> because it's an external function call and only the address of mutex->cs
> is passed?
>
> The only other cause I could think of is if ordering at the CPU was
> somehow different (it could be if there're no special provisions for
> calling external functions) or if "InterlockedExchange(&mutex->autoinit,
> 0)" wasn't atomic in updating autoinit and doing the memory barrier.
>
> Either way, I couldn't vouch for the safety of the above logic without
> a memory barrier so this question is purely of an academical nature. :)
>
>> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
>>
>> OK, so I should probably do something like this instead?
>>
>> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
>>       ; /* wait for other thread */
>
> Technically, assuming only the updating of "mutex->cs" is in question,
> the ICE should only be required once after exiting the loop...
>
> There's a question of the propagation of the value of "mutex->autoinit"
> itself, but my take is that the memory barrier on the writing thread
> will push out the updated value across all CPUs, thus preventing an
> infinite loop.  The other factors, value caching and loop optimization
> by the compiler, should be prevented by the "volatile" keyword even with
> gcc or MSVC 2003.
>
>> I really appreciate getting some extra eyes on this, thanks.
>> Concurrent programming is not my strong-suit (as this exercise has
>> shown) ;)
>
> So would I. :)

Ok, so here's the race condition:

Thread1				Thread2
				/* Speculative prefetch */
				prefetch(*mutex);

if (mutex->autoinit) {
if (ICE(&mutex->autoinit, -1, 1) != -1) {
/* Now mutex->autoinit == -1 */
pthread_mutex_init(mutex, NULL);
/* This forces writes out to memory */
ICE(&mutex->autoinit, 0, -1);

				if (mutex->autoinit) {} /* false */
				/* No read barrier here */
				EnterCriticalSection(&mutex->cs);
				/* Use cached mutex->cs from earlier */

Even though you forced the memory write to be ordered in Thread 1 you
did not ensure that the read of autoinit occurred before the read of
mutex->cs in Thread 2.  If the first thing that EnterCriticalSection
does is follow a pointer or read a mutex key value in mutex->cs then
won't necessarily get the right answer.

The rule of memory barriers is the ALWAYS come in pairs.  This simple
example guarantees that Thread2 will read "tmp_a" == 1 if it
previously read "tmp_b" == 1, although getting "tmp_a" == 1 and
"tmp_b" != 1 is still possible.

Thread1:
a = 1;
write_barrier();
b = 1;

Thread2:
tmp_b = b;
read_barrier();
tmp_a = a;

I think there's a Documentation/memory-barriers.txt file in the kernel
source code with more helpful info.

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-27 23:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v39eez1ph.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 12:14:34PM -0700, Junio C Hamano wrote:

> > Minor nit, but this is just for diff, so it is not about clean/smudge
> > but rather about doing something like textconv.
> 
> I can understand if some tools in the Windows land prefer to work with
> these encodings, so clean/smudge to have the checkout in these encodings
> would be a reasonable thing not just diff but things like grep. On the
> other hand, I do doubt the sanity of these people if they want to have
> in-repository representation also in these encodings.

I'm pretty much of the same mind. We do have people with utf-16 in their
repositories on github. I have no idea why they do such a thing, or what
kinds of tricks they do to make it usable (because without it, they just
get "binary files differ").

My interest is to make things like bare-repository diff (and everything
built on it; i.e., things like github, gitweb, or whatever) do the sane
thing for these people, even if I think what they're doing is wrong. And
as always, I try to structure the git portions of that as much as
possible to be general and help everybody, so they can be pushed
upstream (also, then I don't have to worry about managing local changes
:) ).

But it sounds like this is probably just too ugly and should end up as a
github-specific thing.

-Peff

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-28  0:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027234429.GA28187@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> My interest is to make things like bare-repository diff (and everything
> built on it; i.e., things like github, gitweb, or whatever) do the sane
> thing for these people, even if I think what they're doing is wrong.

I do not think we are talking about right or wrong. I was primarily saying
that textconv may not be the right thing (think github/gitweb showing blob
contents, nicely formatted inside the chrome the site provides).

The solution you suggested feels like a gross layering violation, unless
we do it everywhere, in which case I wouldn't mind too much.

We have in-repository representation that diff and grep and friends work
on, and output conversion layer that externalizes the result of them in
the form of "smudge". Another layer above the in-repository representation
and below operations could convert UTF-16 to UTF-8 when going outward and
in the opposite when going inward.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-28  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v1utyx9ri.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 05:03:29PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > My interest is to make things like bare-repository diff (and everything
> > built on it; i.e., things like github, gitweb, or whatever) do the sane
> > thing for these people, even if I think what they're doing is wrong.
> 
> I do not think we are talking about right or wrong. I was primarily saying
> that textconv may not be the right thing (think github/gitweb showing blob
> contents, nicely formatted inside the chrome the site provides).

But I think it is probably a wrong thing to store utf-16 as the
canonical format inside the git repository. Git simply can't handle it
for diffing. And the right thing, as you suggested, is clean/smudge.

But I'm dealing with repositories on the server side, where it is too
late to do clean/smudge; I just get whatever junk people commited.

> We have in-repository representation that diff and grep and friends work
> on, and output conversion layer that externalizes the result of them in
> the form of "smudge". Another layer above the in-repository representation
> and below operations could convert UTF-16 to UTF-8 when going outward and
> in the opposite when going inward.

I'm not sure that could sanely be done in a backwards compatible way.
Doing it with just textual diffs is a hack, of course, but at least we
know that the damage is limited, and the diff we generate on top doesn't
care that much about the original sha1s[1]. But should read_object_sha1
learn to convert utf-16 into utf-8? I think madness lies that way, as
we are breaking assumptions about sha1 validity.

-Peff

[1] Actually, the text diff does mention the original and resulting
sha1s, which would now either bear no relation to the diff text, or bear
no relation to what's in the repo. Either way, I think we are creating
something that can't necessarily be applied, which is bad. And is why I
thought of textconv, which is basically the same concept (and has the
same problems).

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox