Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Fix 'Cloning into' message
From: Richard Hartmann @ 2011-10-27 18:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vvcraz64i.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 19:39, Junio C Hamano <gitster@pobox.com> wrote:


> We try to be consistent and many places do quote user supplied paths in
> pair of single quotes in human readable messages, and this is in line with
> that pattern.

Exactly. That is why I chose single quotes. Sorry for not including
the signed-off the first time.


>    $ git clone foo "joey's foo"

In that case, the user has other problems ;)


Richard

^ permalink raw reply

* Re: [PATCH/WIP 05/11] symbolize return values of tree_entry_interesting()
From: Junio C Hamano @ 2011-10-27 18:36 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1319438176-7304-6-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This helps extending the value later on for "interesting, but cannot
> decide if the entry truely matches yet" (ie. prefix matches)
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Good change; it is a basic code hygiene to avoid magic constants anyway.

> diff --git a/tree-diff.c b/tree-diff.c
> index 6782484..25cc981 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -114,12 +114,13 @@ static void show_entry(struct diff_options *opt, const char *prefix,
>  }
>  
>  static void skip_uninteresting(struct tree_desc *t, struct strbuf *base,
> -			       struct diff_options *opt, int *match)
> +			       struct diff_options *opt,
> +			       enum interesting *match)
>  {
>  	while (t->size) {
>  		*match = tree_entry_interesting(&t->entry, base, 0, &opt->pathspec);
>  		if (*match) {
> -			if (*match < 0)
> +			if (*match == all_entries_not_interesting)
>  				t->size = 0;
>  			break;
>  		}

The caller of this function needs to be updated as well.

But I have to wonder why this skip_uninteresting() does not peek the
original value of *match and skip, which is the loop structure the other
caller of tree_entry_interesting() in this file has.

diff --git a/tree-diff.c b/tree-diff.c
index 25cc981..de4ba28 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -133,7 +133,7 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 {
 	struct strbuf base;
 	int baselen = strlen(base_str);
-	int t1_match = 0, t2_match = 0;
+	enum interesting t1_match, t2_match;
 
 	/* Enable recursion indefinitely */
 	opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
@@ -142,6 +142,9 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 	strbuf_init(&base, PATH_MAX);
 	strbuf_add(&base, base_str, baselen);
 
+	/* Initialize to something other than all_entries_not_interesting */
+	t1_match = t2_match = entry_not_interesting;
+
 	for (;;) {
 		if (diff_can_quit_early(opt))
 			break;

^ permalink raw reply related

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-27 18:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027181303.GF1967@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
> to utf8, and then display them in utf8. Is that too gross for us to
> consider?

I tend to think so; it is entirely a different matter if the user
instructed us to clean/smudge UTF-16 payload into/outof UTF-8.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-27 18:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v7h3qz2yo.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 11:47:27AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
> > to utf8, and then display them in utf8. Is that too gross for us to
> > consider?
> 
> I tend to think so; it is entirely a different matter if the user
> instructed us to clean/smudge UTF-16 payload into/outof UTF-8.

Minor nit, but this is just for diff, so it is not about clean/smudge
but rather about doing something like textconv.

The other option I mentioned would be something like detecting the BOM
and pretending as if the attribute "diff=utf-16" was set (which would do
nothing by default). Then people could set themselves up to handle
utf-16 if they wanted, but wouldn't have to go around marking each file
with .gitattributes.

But maybe that is too gross, too, and they should just use
.gitattributes.

-Peff

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-27 19:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027185220.GA26621@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Oct 27, 2011 at 11:47:27AM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > I'm tempted to detect the UTF-{16,32}{LE,BE} by their BOM, reencode them
>> > to utf8, and then display them in utf8. Is that too gross for us to
>> > consider?
>> 
>> I tend to think so; it is entirely a different matter if the user
>> instructed us to clean/smudge UTF-16 payload into/outof UTF-8.
>
> Minor nit, but this is just for diff, so it is not about clean/smudge
> but rather about doing something like textconv.

I can understand if some tools in the Windows land prefer to work with
these encodings, so clean/smudge to have the checkout in these encodings
would be a reasonable thing not just diff but things like grep. On the
other hand, I do doubt the sanity of these people if they want to have
in-repository representation also in these encodings.

So I do not think "it is just for diff" is any improvement.

^ permalink raw reply

* Repository data loss in fast-export with a merge of a deleted submodule
From: Joshua Jensen @ 2011-10-27 19:27 UTC (permalink / raw)
  To: git@vger.kernel.org

Hello.

We had a submodule that we deleted and then added back into the 
repository at the same location as the former submodule.  When running 
fast-export, the newly 'added' files for the merge commit are listed and 
then are followed with a:

M ... path/to/submodule/file
D path/to/submodule

On fast-import, the resultant repository becomes corrupt due to the 
Delete instruction above occurring AFTER the file adds/modifications.  
The new repository does not match the old repository where the 
fast-export was performed.

I have included a repro script below.  I have not been able to test this 
on Git 1.7.7.1, but I have tested on Git 1.7.7 (msysGit version).

Please compare the differences between the generated main.fe and 
newmain.fe files.  newmain.fe has data loss.

I am not familiar with the fast-export code.  Can anyone help out?

Thanks.

Josh

---------

rm -rf main brokenmain sub main.fenewmain.fe

# Create the submodule.
mkdir sub
cd sub
git init
echo file > file
git add file
git commit -m file
cd ..

# Create the main repository.
mkdir main
cd main
git init

# Add the submodule.
git submodule add ../sub sub
git commit -m "Add submodule"

# Remove the submodule.
rm -rf sub
git rm sub .gitmodules
git commit -m "Remove submodule"

# Add sub/file to the master branch.
mkdir sub
echo file > sub/file
git add sub/file
git commit -m "Add sub/file"
if [ -f sub/file ]; then
     echo "main: master branch: sub/file exists."
fi

# Delete the submodule directory manually, because we know that the 
incoming merge will need it gone.
git checkout -B will-be-broken HEAD^^
rm -rf sub
git merge --no-ff master

# sub/file exists within the 'will-be-broken' branch.
if [ -f sub/file ]; then
     echo "main: will-be-broken branch: sub/file exists."
fi

# Export out the main repository.
git fast-export --all > ../main.fe

# Create the brokenmain repository.
cd ..
mkdir brokenmain
cd brokenmain
git init

# Import in everything from the main repository.
git fast-import < ../main.fe

# sub/file exists within the master branch.
git checkout master
if [ -f sub/file ]; then
     echo "brokenmain: master branch: sub/file exists."
fi

# sub/file SHOULD exist within the 'will-be-broken' branch but doesn't.
git checkout will-be-broken
if [ ! -f sub/file ]; then
     echo "brokenmain: will-be-broken branch: sub/file SHOULD exist but 
doesn't."
fi

# Export out the brokenmain repository.
git fast-export --all > ../brokenmain.fe

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 21:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <1310084657-16790-3-git-send-email-gitster@pobox.com>

On Fri, Jul 8, 2011 at 02:24, Junio C Hamano <gitster@pobox.com> wrote:

>  - Commits at the tip of tags are written together, in the hope that
>   revision traversal done in incremental fetch (which starts by
>   putting them in a revision queue marked as UNINTERESTING) will see a
>   better locality of these objects;

We recently upgraded to 1.7.7 and I've traced a very large slowdown in
packing down to this commit.

On our repository packing used to take around 30 seconds, it now takes
about 4 minutes. Which means that cloning the repository went from
being slightly slow to pretty intolerable.

I haven't dug into why this is but I'm pretty sure it's because this
patch makes Git behave pathologically on repositories with a large
amount of tags. git.git itself has ~27k revs / and ~450 tags, or a tag
every ~60 revisions.

Try it on e.g. a repository with a couple of hundred thousand revs and
a tag every 10-20 revisions.

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6Ss4jLtdrDhLUNKCUjEHjHHKzfv-LMkOyTPDhRUXm4sQ@mail.gmail.com>

On Thu, Oct 27, 2011 at 23:01, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> I haven't dug into why this is but I'm pretty sure it's because this
> patch makes Git behave pathologically on repositories with a large
> amount of tags. git.git itself has ~27k revs / and ~450 tags, or a tag
> every ~60 revisions.
>
> Try it on e.g. a repository with a couple of hundred thousand revs and
> a tag every 10-20 revisions.

Actually it just seems slow in general, not just on repositories with
a lot of tags, on linux-2.6.git with this patch:

    $ time ~/g/git/git-pack-objects --keep-true-parents
--honor-pack-keep --non-empty --all --reflog
--delta-base-offs</dev/null
/home/avar/g/linux-2.6/.git/objects/pack/.tmp-pack
    Counting objects: 2184059, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (338780/338780), done.
    130605d5b82492a38452b9bc7bddfb4a6ef0e942
    Writing objects: 100% (2184059/2184059), done.
    Total 2184059 (delta 1825129), reused 2184059 (delta 1825129)

    real    1m28.426s
    user    1m19.661s
    sys     0m3.008s

With it reverted:

    $ time ~/g/git/git-pack-objects --keep-true-parents
--honor-pack-keep --non-empty --all --reflog --delta-base-offset
</dev/null /home/avar/g/linux-2.6/.git/objects/pack/.tmp-pack
    Counting objects: 2184059, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (338780/338780), done.
    130605d5b82492a38452b9bc7bddfb4a6ef0e942
    Writing objects: 100% (2184059/2184059), done.
    Total 2184059 (delta 1825129), reused 2184059 (delta 1825129)

    real    0m50.152s
    user    0m45.367s
    sys     0m2.920s

And if you create a lot of tags:

    git log --pretty=%h | perl -lne 'chomp(my $sha = <>); print $sha
if $i++ % 10 == 0' | parallel "git tag test-tag-{} {}"

Resulting in:

    $ git rev-list HEAD | wc -l
    269514
    $ git tag -l | wc -l
    13733

This'll be the time with this patch reverted, i.e. almost exactly the
same as with just ~250 tags:

    real    0m52.860s
    user    0m44.907s
    sys     0m3.172s

And with it applied, i.e. almost exactly the same as with just ~250
tags:

    real    1m29.080s
    user    1m21.825s
    sys     0m3.096s

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-27 22:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6ZWOF=j-k8o-4NHmjS2HpyS+PmKjJh_QKevWurBf9pbA@mail.gmail.com>

On Thu, Oct 27, 2011 at 23:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> Actually it just seems slow in general, not just on repositories with
> a lot of tags, on linux-2.6.git with this patch:

Here's profiling with gprof for everything with >1% of execution time
with the patch applied:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls   s/call   s/call  name
     21.07     15.99    15.99  2184059     0.00     0.00
add_descendants_to_write_order
     20.25     31.35    15.37 1146371554     0.00     0.00  add_to_write_order
     11.94     40.41     9.06 142180385     0.00     0.00  hashcmp
      5.55     44.62     4.21 90592818     0.00     0.00  lookup_object
      4.64     48.14     3.52 72804470     0.00     0.00  hashcmp
      3.87     51.08     2.94 90007452     0.00     0.00  get_mode
      3.31     53.59     2.51 90007452     0.00     0.00  decode_tree_entry
      1.90     55.03     1.44  2184059     0.00     0.00
add_family_to_write_order
      1.79     56.39     1.36 43247856     0.00     0.00  hashcmp
      1.29     57.37     0.98                             pack_offset_sort
      1.27     58.33     0.96 90007452     0.00     0.00  update_tree_entry
      1.27     59.29     0.96 90592817     0.00     0.00  hashtable_index
      1.20     60.20     0.91  4009188     0.00     0.00  find_pack_revindex
      1.19     61.10     0.90  5899321     0.00     0.00  find_pack_entry_one
      1.12     61.95     0.85   269514     0.00     0.00
commit_list_insert_by_date
      1.08     62.77     0.82  5387773     0.00     0.00  patch_delta

And without:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls   s/call   s/call  name
     21.29      9.13     9.13 142180385     0.00     0.00  hashcmp
     10.59     13.67     4.54 90592818     0.00     0.00  lookup_object
      8.48     17.31     3.64 72638478     0.00     0.00  hashcmp
      6.60     20.14     2.83 90007452     0.00     0.00  decode_tree_entry
      6.15     22.77     2.64 90007452     0.00     0.00  get_mode
      2.99     24.05     1.28 43247182     0.00     0.00  hashcmp
      2.96     25.32     1.27 90592817     0.00     0.00  hashtable_index
      2.47     26.38     1.06 90007452     0.00     0.00  update_tree_entry
      2.26     27.35     0.97  4009188     0.00     0.00  find_pack_revindex
      2.05     28.23     0.88   269245     0.00     0.00  process_tree
      1.96     29.07     0.84   269514     0.00     0.00
commit_list_insert_by_date
      1.94     29.90     0.83                             pack_offset_sort
      1.73     30.64     0.74  5389900     0.00     0.00  patch_delta
      1.70     31.37     0.73  5885588     0.00     0.00  find_pack_entry_one
      1.38     31.96     0.59  8692967     0.00     0.00  hashcmp
      1.24     32.49     0.53  8175096     0.00     0.00
unpack_object_header_buffer
      1.14     32.98     0.49        1     0.49     0.59  write_idx_file
      1.12     33.46     0.48  5885588     0.00     0.00
nth_packed_object_offset
      1.12     33.94     0.48  6051632     0.00     0.00
locate_object_entry_hash

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Junio C Hamano @ 2011-10-27 22:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6Ss4jLtdrDhLUNKCUjEHjHHKzfv-LMkOyTPDhRUXm4sQ@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We recently upgraded to 1.7.7 and I've traced a very large slowdown in
> packing down to this commit.

Does Dan McGee's series leading to 38d4deb (pack-objects: don't traverse
objects unnecessarily, 2011-10-18) help?

^ permalink raw reply

* Re: [PATCH 4/4] pack-objects: rewrite add_descendants_to_write_order() iteratively
From: Junio C Hamano @ 2011-10-27 22:13 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-4-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> This removes the need to call this function recursively, shinking the
> code size slightly and netting a small performance increase.
>
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>

Tricky.

As long as this is done after compute_write_order() populates
delta_sibling vs delta link in a consistent way, the new logic should
produce exactly the same result as the original code.

Thanks.

^ permalink raw reply

* git alias and --help
From: Gelonida N @ 2011-10-27 22:20 UTC (permalink / raw)
  To: git

I'm having a tiny question about git aliases.

in ~/.gitconfig I defined an alias

b = branch



This works well execp if I type


git b --help

Now I get the output
`git b' is aliased to `branch'


which is nice to know however what I reall wanted is to get the helptext
for the branch command.


Is there any trick to get the help text of git branch without
having to type

git branch --help

^ permalink raw reply

* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Junio C Hamano @ 2011-10-27 22:26 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-3-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> Two optimizations take place here- we can start our objects array
> iteration from a known point where we left off before we started trying
> to find our tags,

This I would understand (but I am somewhat curious how much last_untagged
would advance relative to nr_objects for this half of the optimization to
be worth it), but...

> and we don't need to do the deep dives required by
> add_family_to_write_order() if the object has already been marked as
> filled.

I am not sure if this produces the identical result that was benchmarked
in the original series.

For example, if you have a tagged object that is not a commit (say a
blob), you would have written that blob in the second phase (write tagged
objects together), so the family of blobs that share same delta parent as
that blob will not be written in this "Finally all the rest" in the right
place in the original list, no?

I do not think this change would forget to fill an object that needs to be
filled, but it would affect the resulting ordering of the list, so...

> @@ -560,8 +561,13 @@ static struct object_entry **compute_write_order(void)
>  	/*
>  	 * Finally all the rest in really tight order
>  	 */
> -	for (i = 0; i < nr_objects; i++)
> -		add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	for (i = last_untagged; i < nr_objects; i++) {
> +		if (!objects[i].filled)
> +			add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	}
> +
> +	if(wo_end != nr_objects)
> +		die("ordered %u objects, expected %u", wo_end, nr_objects);

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:28 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <j8clg9$ldh$1@dough.gmane.org>

Gelonida N <gelonida@gmail.com> writes:

> Is there any trick to get the help text of git branch without
> having to type
>
> git branch --help

How about "git help branch"?

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Jakub Narebski @ 2011-10-27 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Shawn O. Pearce
In-Reply-To: <CACBZZX7tghoHhxCygEj9DZSxvKyTvybawVA2HwHBkjBaH73Ujg@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Oct 27, 2011 at 23:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> > Actually it just seems slow in general, not just on repositories with
> > a lot of tags, on linux-2.6.git with this patch:
> 
> Here's profiling with gprof for everything with >1% of execution time
> with the patch applied:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.07     15.99    15.99  2184059     0.00     0.00  add_descendants_to_write_order
>      20.25     31.35    15.37 1146371554     0.00     0.00  add_to_write_order
[...]
> 
> And without:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.29      9.13     9.13 142180385     0.00     0.00  hashcmp
>      10.59     13.67     4.54 90592818     0.00     0.00  lookup_object
[...]

Errr... do or do not gprof results include time spend in libraries?
Though that might not matter for this case.

Can you repeat profiling using "perf events" or something using it
(e.g. via PAPI library like HPCToolkit)?

-- 
Jakub Narębski

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:50 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <7vfwiexe6m.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Gelonida N <gelonida@gmail.com> writes:
>
>> Is there any trick to get the help text of git branch without
>> having to type
>>
>> git branch --help
>
> How about "git help branch"?

It was bad of me to write a tongue-in-cheek answer, get distracted and
ended up sending it without the real answer.

The reason why we do not do what you seem to be suggesting is because
giving the same behaviour to "git b --help" as "git branch --help" is
wrong.

To see why, imagine you have configured an alias that is not a simple and
stupid substitution "b == branch", but something like "bt == branch -t",
and then want to know what you should write after "git bt".  Giving the
manpage for branch without giving them any hint that they configured that
alias to produce customized behaviour that is different from plain vanilla
"branch" is not quite acceptable.

I think you _could_ make a patch that special cases a simple and straight
substitution and skip the "foo is aliased to bar" step, but I doubt it is
worth it to lose consistency between "git b --help" vs "git bt --help"
that way.

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Atsushi Nakagawa @ 2011-10-27 23:00 UTC (permalink / raw)
  To: kusmabite; +Cc: Kyle Moffett, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <CABPQNSY9LdqOK=1nN_61ZRMv-ieXzSDYgNsQe0w21RAOw_D7yA@mail.gmail.com>

Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> >>> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
> >>>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> >>>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
> >>>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
> >>>>>> +{
> >>>>>> + [snip]
> >>>>>
> >>>>> The double-checked locking idiom. Very suspicious. Can you explain why it
> >>> [snip]
> >>>
> >>>  if (mutex->autoinit) {
> >>>
> >>> Assume two threads enter this block.
> >>>
> >>>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
> >>>
> >>> Only one thread, A, say on CPU A, will enter this block.
> >>>
> >>>        InitializeCriticalSection(&mutex->cs);
> >>>
> >>> Thread A writes some values. Note that there are no memory barriers
> >>> involved here. Not that I know of or that they would be documented.
> >>>
> >>>        mutex->autoinit = 0;
> >>>
> >>> And it writes another one. Thread A continues below to contend for the
> >>> mutex it just initialized.
> >>>
> >>>     } else
> >>>
> >>> Meanwhile, thread B, say on CPU B, spins in this loop:
> >>>
> >>>        while (mutex->autoinit != 0)
> >>>           ; /* wait for other thread */
> >>>
> >>> When thread B arrives here, it sees the value of autoinit that thread A
> >>> has written above.
> >>>
> >>> [snip]
> >>>
> >>
> >> Thanks for pointing this out, I completely forgot about write re-ordering.
> >>
> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
> >> InterlockedExchange generates a full memory barrier:
> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
> >
> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
> >
> > A write barrier in one thread is only effective if it is paired with a
> > read barrier in the other thread.
> >
> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
> > you don't have any guaranteed ordering.

Out of curiosity, where could re-ordering be a problem here?  I'm
thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
of "mutex->cs" not being propagated to the waiting thread.  However,
shouldn't that be a non-problem, as far as compiler reordering goes,
because it's an external function call and only the address of mutex->cs
is passed?

The only other cause I could think of is if ordering at the CPU was
somehow different (it could be if there're no special provisions for
calling external functions) or if "InterlockedExchange(&mutex->autoinit,
0)" wasn't atomic in updating autoinit and doing the memory barrier.

Either way, I couldn't vouch for the safety of the above logic without
a memory barrier so this question is purely of an academical nature. :)

> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
> 
> OK, so I should probably do something like this instead?
> 
> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
> 	; /* wait for other thread */

Technically, assuming only the updating of "mutex->cs" is in question,
the ICE should only be required once after exiting the loop...

There's a question of the propagation of the value of "mutex->autoinit"
itself, but my take is that the memory barrier on the writing thread
will push out the updated value across all CPUs, thus preventing an
infinite loop.  The other factors, value caching and loop optimization
by the compiler, should be prevented by the "volatile" keyword even with
gcc or MSVC 2003.

> I really appreciate getting some extra eyes on this, thanks.
> Concurrent programming is not my strong-suit (as this exercise has
> shown) ;)

So would I. :)

-- 
Atsushi Nakagawa
<atnak@chejz.com>
Changes are made when there is inconvenience.

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Kyle Moffett @ 2011-10-27 23:20 UTC (permalink / raw)
  To: Atsushi Nakagawa
  Cc: kusmabite, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <20111028080033.57FC.B013761@chejz.com>

On Thu, Oct 27, 2011 at 19:00, Atsushi Nakagawa <atnak@chejz.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
>> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> >> Thanks for pointing this out, I completely forgot about write re-ordering.
>> >>
>> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
>> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
>> >> InterlockedExchange generates a full memory barrier:
>> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
>> >
>> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
>> >
>> > A write barrier in one thread is only effective if it is paired with a
>> > read barrier in the other thread.
>> >
>> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
>> > you don't have any guaranteed ordering.
>
> Out of curiosity, where could re-ordering be a problem here?  I'm
> thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
> of "mutex->cs" not being propagated to the waiting thread.  However,
> shouldn't that be a non-problem, as far as compiler reordering goes,
> because it's an external function call and only the address of mutex->cs
> is passed?
>
> The only other cause I could think of is if ordering at the CPU was
> somehow different (it could be if there're no special provisions for
> calling external functions) or if "InterlockedExchange(&mutex->autoinit,
> 0)" wasn't atomic in updating autoinit and doing the memory barrier.
>
> Either way, I couldn't vouch for the safety of the above logic without
> a memory barrier so this question is purely of an academical nature. :)
>
>> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
>>
>> OK, so I should probably do something like this instead?
>>
>> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
>>       ; /* wait for other thread */
>
> Technically, assuming only the updating of "mutex->cs" is in question,
> the ICE should only be required once after exiting the loop...
>
> There's a question of the propagation of the value of "mutex->autoinit"
> itself, but my take is that the memory barrier on the writing thread
> will push out the updated value across all CPUs, thus preventing an
> infinite loop.  The other factors, value caching and loop optimization
> by the compiler, should be prevented by the "volatile" keyword even with
> gcc or MSVC 2003.
>
>> I really appreciate getting some extra eyes on this, thanks.
>> Concurrent programming is not my strong-suit (as this exercise has
>> shown) ;)
>
> So would I. :)

Ok, so here's the race condition:

Thread1				Thread2
				/* Speculative prefetch */
				prefetch(*mutex);

if (mutex->autoinit) {
if (ICE(&mutex->autoinit, -1, 1) != -1) {
/* Now mutex->autoinit == -1 */
pthread_mutex_init(mutex, NULL);
/* This forces writes out to memory */
ICE(&mutex->autoinit, 0, -1);

				if (mutex->autoinit) {} /* false */
				/* No read barrier here */
				EnterCriticalSection(&mutex->cs);
				/* Use cached mutex->cs from earlier */

Even though you forced the memory write to be ordered in Thread 1 you
did not ensure that the read of autoinit occurred before the read of
mutex->cs in Thread 2.  If the first thing that EnterCriticalSection
does is follow a pointer or read a mutex key value in mutex->cs then
won't necessarily get the right answer.

The rule of memory barriers is the ALWAYS come in pairs.  This simple
example guarantees that Thread2 will read "tmp_a" == 1 if it
previously read "tmp_b" == 1, although getting "tmp_a" == 1 and
"tmp_b" != 1 is still possible.

Thread1:
a = 1;
write_barrier();
b = 1;

Thread2:
tmp_b = b;
read_barrier();
tmp_a = a;

I think there's a Documentation/memory-barriers.txt file in the kernel
source code with more helpful info.

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-27 23:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v39eez1ph.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 12:14:34PM -0700, Junio C Hamano wrote:

> > Minor nit, but this is just for diff, so it is not about clean/smudge
> > but rather about doing something like textconv.
> 
> I can understand if some tools in the Windows land prefer to work with
> these encodings, so clean/smudge to have the checkout in these encodings
> would be a reasonable thing not just diff but things like grep. On the
> other hand, I do doubt the sanity of these people if they want to have
> in-repository representation also in these encodings.

I'm pretty much of the same mind. We do have people with utf-16 in their
repositories on github. I have no idea why they do such a thing, or what
kinds of tricks they do to make it usable (because without it, they just
get "binary files differ").

My interest is to make things like bare-repository diff (and everything
built on it; i.e., things like github, gitweb, or whatever) do the sane
thing for these people, even if I think what they're doing is wrong. And
as always, I try to structure the git portions of that as much as
possible to be general and help everybody, so they can be pushed
upstream (also, then I don't have to worry about managing local changes
:) ).

But it sounds like this is probably just too ugly and should end up as a
github-specific thing.

-Peff

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-28  0:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027234429.GA28187@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> My interest is to make things like bare-repository diff (and everything
> built on it; i.e., things like github, gitweb, or whatever) do the sane
> thing for these people, even if I think what they're doing is wrong.

I do not think we are talking about right or wrong. I was primarily saying
that textconv may not be the right thing (think github/gitweb showing blob
contents, nicely formatted inside the chrome the site provides).

The solution you suggested feels like a gross layering violation, unless
we do it everywhere, in which case I wouldn't mind too much.

We have in-repository representation that diff and grep and friends work
on, and output conversion layer that externalizes the result of them in
the form of "smudge". Another layer above the in-repository representation
and below operations could convert UTF-16 to UTF-8 when going outward and
in the opposite when going inward.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-28  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v1utyx9ri.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 05:03:29PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > My interest is to make things like bare-repository diff (and everything
> > built on it; i.e., things like github, gitweb, or whatever) do the sane
> > thing for these people, even if I think what they're doing is wrong.
> 
> I do not think we are talking about right or wrong. I was primarily saying
> that textconv may not be the right thing (think github/gitweb showing blob
> contents, nicely formatted inside the chrome the site provides).

But I think it is probably a wrong thing to store utf-16 as the
canonical format inside the git repository. Git simply can't handle it
for diffing. And the right thing, as you suggested, is clean/smudge.

But I'm dealing with repositories on the server side, where it is too
late to do clean/smudge; I just get whatever junk people commited.

> We have in-repository representation that diff and grep and friends work
> on, and output conversion layer that externalizes the result of them in
> the form of "smudge". Another layer above the in-repository representation
> and below operations could convert UTF-16 to UTF-8 when going outward and
> in the opposite when going inward.

I'm not sure that could sanely be done in a backwards compatible way.
Doing it with just textual diffs is a hack, of course, but at least we
know that the damage is limited, and the diff we generate on top doesn't
care that much about the original sha1s[1]. But should read_object_sha1
learn to convert utf-16 into utf-8? I think madness lies that way, as
we are breaking assumptions about sha1 validity.

-Peff

[1] Actually, the text diff does mention the original and resulting
sha1s, which would now either bear no relation to the diff text, or bear
no relation to what's in the repo. Either way, I think we are creating
something that can't necessarily be applied, which is bad. And is why I
thought of textconv, which is basically the same concept (and has the
same problems).

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28  0:24 UTC (permalink / raw)
  To: git
In-Reply-To: <7v8vo6xd4u.fsf@alter.siamese.dyndns.org>

On 10/28/2011 12:50 AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Gelonida N <gelonida@gmail.com> writes:
>>
>>> Is there any trick to get the help text of git branch without
>>> having to type
>>>
>>> git branch --help
>>
>> How about "git help branch"?
> 
> It was bad of me to write a tongue-in-cheek answer, get distracted and
> ended up sending it without the real answer.
No issues. At least I got an answer :-)
> 
> The reason why we do not do what you seem to be suggesting is because
> giving the same behaviour to "git b --help" as "git branch --help" is
> wrong.
> 
> To see why, imagine you have configured an alias that is not a simple and
> stupid substitution "b == branch", but something like "bt == branch -t",
> and then want to know what you should write after "git bt".  Giving the
> manpage for branch without giving them any hint that they configured that
> alias to produce customized behaviour that is different from plain vanilla
> "branch" is not quite acceptable.
> 
> I think you _could_ make a patch that special cases a simple and straight
> substitution and skip the "foo is aliased to bar" step, but I doubt it is
> worth it to lose consistency between "git b --help" vs "git bt --help"
> that way.


I understand the reasoning and agree, that as general case it might not
be a good idea to have the behaviour, taht I expected to be the default
behaviour

It's just, that I am lazy so I shortened  my most common commands

For example:
co=checkout
ci=commit
b=branch
sm=submodule

However I'm not smart enough to remember all options for all commands
and so I have to ask git for occasional help.


It feels so 'unlazy' to have to type the full command 'just' to get help


What I wondered though is whether there couldn't be a way to be lazy and
avoid confusion.

1.) prompt for full help
-------------------------

One option would be to display what the command is aliased to and to
prompt whether one wants to see the full help of the base command.

`git b' is aliased to `branch'
pleaset press h if you want to see the full help for git branch
or press q to quit


2.) allow copy paste by displaying the command to be typed for help.
---------------------------------------------------------------------
display sometihing like:

`git b' is aliased to `branch'
for help about the branch command type:  git --help


Then lazy people could copy paste the command instead of typing it.


3.) special alias syntax to allow 'forwarding' of --help
------------------------------------------------------------

some kind of special alias syntax, which explicitely forces that --help
is 'forwarded' to the alias and not treated by git


example:

b = branch   # --help would have the current behaviour

I tried following to see whether '--help' is now being 'forwarded'

b = ! git branch
but also  here --help is treated by git

A possibility could be another 'special' character. For the sake of the
example I used '?'
b = ?branch # 'the question mark would allow 'forwarding' the
             # --help option to the alias'

b = ?! git branch



My workaround
------------

Fort the time being I might do something like

bhlp = branch --help

I would have prefered b_hlp or b_h, but it seems underscore isn't
allowed in an alias.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Miles Bader @ 2011-10-28  1:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Nguyen Thai Ngoc Duy,
	Ævar Arnfjörð
In-Reply-To: <20111027234429.GA28187@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:
> We do have people with utf-16 in their repositories on github. I
> have no idea why they do such a thing, or what kinds of tricks they
> do to make it usable (because without it, they just get "binary
> files differ").

Hmm, you could ask them ...  [or, I suppose more diplomatically, post
a blog entry asking "Hey all you people who use github for utf-16
encoded files, ..."]

-Miles

-- 
Year, n. A period of three hundred and sixty-five disappointments.

^ permalink raw reply

* Re: git alias and --help
From: Miles Bader @ 2011-10-28  1:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Gelonida N, git
In-Reply-To: <7v8vo6xd4u.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:
>>> git branch --help
>>
>> How about "git help branch"?
>
> The reason why we do not do what you seem to be suggesting is because
> giving the same behaviour to "git b --help" as "git branch --help" is
> wrong.

I agree with Gelonida's followup:  although what you say makes sense,
it's still pretty annoying behavior for the very common case of a
simple renaming alias...

E.g., I have "co" aliased to "checkout", and so my fingers are very
very inclined to say "co" when I mean checkout... including when
asking for help.  I actually end up typing "git co --help", grumbling,
and retyping with the full command name, quite reguarly.

What I've often wished is that git's help system would output
something like:

   $ git help co
   `git co' is aliased to `checkout'

   Here's the help entry for `checkout':

   GIT-CHECKOUT(1)                   Git Manual                   GIT-CHECKOUT(1)

   NAME
          git-checkout - Checkout a branch or paths to the working tree
   ...

[with the "`git co' is aliased ..." header included in the pager
output.]

Of course, that would be the wrong thing for somebody that just wants
to be reminded what an alias expands too, but my intuition is that
this is a very tiny minority compared to people that want to examine
the options for the underlying command...

-Miles

-- 
Suburbia: where they tear out the trees and then name streets after them.

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-28  4:05 UTC (permalink / raw)
  To: Miles Bader; +Cc: Gelonida N, git
In-Reply-To: <buoty6t9937.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> Of course, that would be the wrong thing for somebody that just wants
> to be reminded what an alias expands too, but my intuition is that
> this is a very tiny minority compared to people that want to examine
> the options for the underlying command...

And it is doubly wrong if help backend is configured to be anything but
manpages, no?

As I said, you should be able to come up with a patch that detects and
special cases the no frills case (replacement to single token) to get what
you want.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox