Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Junio C Hamano @ 2011-10-27 22:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Shawn O. Pearce
In-Reply-To: <CACBZZX6Ss4jLtdrDhLUNKCUjEHjHHKzfv-LMkOyTPDhRUXm4sQ@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We recently upgraded to 1.7.7 and I've traced a very large slowdown in
> packing down to this commit.

Does Dan McGee's series leading to 38d4deb (pack-objects: don't traverse
objects unnecessarily, 2011-10-18) help?

^ permalink raw reply

* Re: [PATCH 4/4] pack-objects: rewrite add_descendants_to_write_order() iteratively
From: Junio C Hamano @ 2011-10-27 22:13 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-4-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> This removes the need to call this function recursively, shinking the
> code size slightly and netting a small performance increase.
>
> Signed-off-by: Dan McGee <dpmcgee@gmail.com>

Tricky.

As long as this is done after compute_write_order() populates
delta_sibling vs delta link in a consistent way, the new logic should
produce exactly the same result as the original code.

Thanks.

^ permalink raw reply

* git alias and --help
From: Gelonida N @ 2011-10-27 22:20 UTC (permalink / raw)
  To: git

I'm having a tiny question about git aliases.

in ~/.gitconfig I defined an alias

b = branch



This works well execp if I type


git b --help

Now I get the output
`git b' is aliased to `branch'


which is nice to know however what I reall wanted is to get the helptext
for the branch command.


Is there any trick to get the help text of git branch without
having to type

git branch --help

^ permalink raw reply

* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Junio C Hamano @ 2011-10-27 22:26 UTC (permalink / raw)
  To: Dan McGee; +Cc: git
In-Reply-To: <1318915284-6361-3-git-send-email-dpmcgee@gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

> Two optimizations take place here- we can start our objects array
> iteration from a known point where we left off before we started trying
> to find our tags,

This I would understand (but I am somewhat curious how much last_untagged
would advance relative to nr_objects for this half of the optimization to
be worth it), but...

> and we don't need to do the deep dives required by
> add_family_to_write_order() if the object has already been marked as
> filled.

I am not sure if this produces the identical result that was benchmarked
in the original series.

For example, if you have a tagged object that is not a commit (say a
blob), you would have written that blob in the second phase (write tagged
objects together), so the family of blobs that share same delta parent as
that blob will not be written in this "Finally all the rest" in the right
place in the original list, no?

I do not think this change would forget to fill an object that needs to be
filled, but it would affect the resulting ordering of the list, so...

> @@ -560,8 +561,13 @@ static struct object_entry **compute_write_order(void)
>  	/*
>  	 * Finally all the rest in really tight order
>  	 */
> -	for (i = 0; i < nr_objects; i++)
> -		add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	for (i = last_untagged; i < nr_objects; i++) {
> +		if (!objects[i].filled)
> +			add_family_to_write_order(wo, &wo_end, &objects[i]);
> +	}
> +
> +	if(wo_end != nr_objects)
> +		die("ordered %u objects, expected %u", wo_end, nr_objects);

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:28 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <j8clg9$ldh$1@dough.gmane.org>

Gelonida N <gelonida@gmail.com> writes:

> Is there any trick to get the help text of git branch without
> having to type
>
> git branch --help

How about "git help branch"?

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Jakub Narebski @ 2011-10-27 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Shawn O. Pearce
In-Reply-To: <CACBZZX7tghoHhxCygEj9DZSxvKyTvybawVA2HwHBkjBaH73Ujg@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Oct 27, 2011 at 23:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> > Actually it just seems slow in general, not just on repositories with
> > a lot of tags, on linux-2.6.git with this patch:
> 
> Here's profiling with gprof for everything with >1% of execution time
> with the patch applied:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.07     15.99    15.99  2184059     0.00     0.00  add_descendants_to_write_order
>      20.25     31.35    15.37 1146371554     0.00     0.00  add_to_write_order
[...]
> 
> And without:
> 
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls   s/call   s/call  name
>      21.29      9.13     9.13 142180385     0.00     0.00  hashcmp
>      10.59     13.67     4.54 90592818     0.00     0.00  lookup_object
[...]

Errr... do or do not gprof results include time spend in libraries?
Though that might not matter for this case.

Can you repeat profiling using "perf events" or something using it
(e.g. via PAPI library like HPCToolkit)?

-- 
Jakub Narębski

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-27 22:50 UTC (permalink / raw)
  To: Gelonida N; +Cc: git
In-Reply-To: <7vfwiexe6m.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Gelonida N <gelonida@gmail.com> writes:
>
>> Is there any trick to get the help text of git branch without
>> having to type
>>
>> git branch --help
>
> How about "git help branch"?

It was bad of me to write a tongue-in-cheek answer, get distracted and
ended up sending it without the real answer.

The reason why we do not do what you seem to be suggesting is because
giving the same behaviour to "git b --help" as "git branch --help" is
wrong.

To see why, imagine you have configured an alias that is not a simple and
stupid substitution "b == branch", but something like "bt == branch -t",
and then want to know what you should write after "git bt".  Giving the
manpage for branch without giving them any hint that they configured that
alias to produce customized behaviour that is different from plain vanilla
"branch" is not quite acceptable.

I think you _could_ make a patch that special cases a simple and straight
substitution and skip the "foo is aliased to bar" step, but I doubt it is
worth it to lose consistency between "git b --help" vs "git bt --help"
that way.

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Atsushi Nakagawa @ 2011-10-27 23:00 UTC (permalink / raw)
  To: kusmabite; +Cc: Kyle Moffett, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <CABPQNSY9LdqOK=1nN_61ZRMv-ieXzSDYgNsQe0w21RAOw_D7yA@mail.gmail.com>

Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> >>> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
> >>>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> >>>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
> >>>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
> >>>>>> +{
> >>>>>> + [snip]
> >>>>>
> >>>>> The double-checked locking idiom. Very suspicious. Can you explain why it
> >>> [snip]
> >>>
> >>>  if (mutex->autoinit) {
> >>>
> >>> Assume two threads enter this block.
> >>>
> >>>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
> >>>
> >>> Only one thread, A, say on CPU A, will enter this block.
> >>>
> >>>        InitializeCriticalSection(&mutex->cs);
> >>>
> >>> Thread A writes some values. Note that there are no memory barriers
> >>> involved here. Not that I know of or that they would be documented.
> >>>
> >>>        mutex->autoinit = 0;
> >>>
> >>> And it writes another one. Thread A continues below to contend for the
> >>> mutex it just initialized.
> >>>
> >>>     } else
> >>>
> >>> Meanwhile, thread B, say on CPU B, spins in this loop:
> >>>
> >>>        while (mutex->autoinit != 0)
> >>>           ; /* wait for other thread */
> >>>
> >>> When thread B arrives here, it sees the value of autoinit that thread A
> >>> has written above.
> >>>
> >>> [snip]
> >>>
> >>
> >> Thanks for pointing this out, I completely forgot about write re-ordering.
> >>
> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
> >> InterlockedExchange generates a full memory barrier:
> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
> >
> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
> >
> > A write barrier in one thread is only effective if it is paired with a
> > read barrier in the other thread.
> >
> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
> > you don't have any guaranteed ordering.

Out of curiosity, where could re-ordering be a problem here?  I'm
thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
of "mutex->cs" not being propagated to the waiting thread.  However,
shouldn't that be a non-problem, as far as compiler reordering goes,
because it's an external function call and only the address of mutex->cs
is passed?

The only other cause I could think of is if ordering at the CPU was
somehow different (it could be if there're no special provisions for
calling external functions) or if "InterlockedExchange(&mutex->autoinit,
0)" wasn't atomic in updating autoinit and doing the memory barrier.

Either way, I couldn't vouch for the safety of the above logic without
a memory barrier so this question is purely of an academical nature. :)

> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
> 
> OK, so I should probably do something like this instead?
> 
> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
> 	; /* wait for other thread */

Technically, assuming only the updating of "mutex->cs" is in question,
the ICE should only be required once after exiting the loop...

There's a question of the propagation of the value of "mutex->autoinit"
itself, but my take is that the memory barrier on the writing thread
will push out the updated value across all CPUs, thus preventing an
infinite loop.  The other factors, value caching and loop optimization
by the compiler, should be prevented by the "volatile" keyword even with
gcc or MSVC 2003.

> I really appreciate getting some extra eyes on this, thanks.
> Concurrent programming is not my strong-suit (as this exercise has
> shown) ;)

So would I. :)

-- 
Atsushi Nakagawa
<atnak@chejz.com>
Changes are made when there is inconvenience.

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Kyle Moffett @ 2011-10-27 23:20 UTC (permalink / raw)
  To: Atsushi Nakagawa
  Cc: kusmabite, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <20111028080033.57FC.B013761@chejz.com>

On Thu, Oct 27, 2011 at 19:00, Atsushi Nakagawa <atnak@chejz.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
>> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> >> Thanks for pointing this out, I completely forgot about write re-ordering.
>> >>
>> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
>> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
>> >> InterlockedExchange generates a full memory barrier:
>> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
>> >
>> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
>> >
>> > A write barrier in one thread is only effective if it is paired with a
>> > read barrier in the other thread.
>> >
>> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
>> > you don't have any guaranteed ordering.
>
> Out of curiosity, where could re-ordering be a problem here?  I'm
> thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
> of "mutex->cs" not being propagated to the waiting thread.  However,
> shouldn't that be a non-problem, as far as compiler reordering goes,
> because it's an external function call and only the address of mutex->cs
> is passed?
>
> The only other cause I could think of is if ordering at the CPU was
> somehow different (it could be if there're no special provisions for
> calling external functions) or if "InterlockedExchange(&mutex->autoinit,
> 0)" wasn't atomic in updating autoinit and doing the memory barrier.
>
> Either way, I couldn't vouch for the safety of the above logic without
> a memory barrier so this question is purely of an academical nature. :)
>
>> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
>>
>> OK, so I should probably do something like this instead?
>>
>> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
>>       ; /* wait for other thread */
>
> Technically, assuming only the updating of "mutex->cs" is in question,
> the ICE should only be required once after exiting the loop...
>
> There's a question of the propagation of the value of "mutex->autoinit"
> itself, but my take is that the memory barrier on the writing thread
> will push out the updated value across all CPUs, thus preventing an
> infinite loop.  The other factors, value caching and loop optimization
> by the compiler, should be prevented by the "volatile" keyword even with
> gcc or MSVC 2003.
>
>> I really appreciate getting some extra eyes on this, thanks.
>> Concurrent programming is not my strong-suit (as this exercise has
>> shown) ;)
>
> So would I. :)

Ok, so here's the race condition:

Thread1				Thread2
				/* Speculative prefetch */
				prefetch(*mutex);

if (mutex->autoinit) {
if (ICE(&mutex->autoinit, -1, 1) != -1) {
/* Now mutex->autoinit == -1 */
pthread_mutex_init(mutex, NULL);
/* This forces writes out to memory */
ICE(&mutex->autoinit, 0, -1);

				if (mutex->autoinit) {} /* false */
				/* No read barrier here */
				EnterCriticalSection(&mutex->cs);
				/* Use cached mutex->cs from earlier */

Even though you forced the memory write to be ordered in Thread 1 you
did not ensure that the read of autoinit occurred before the read of
mutex->cs in Thread 2.  If the first thing that EnterCriticalSection
does is follow a pointer or read a mutex key value in mutex->cs then
won't necessarily get the right answer.

The rule of memory barriers is the ALWAYS come in pairs.  This simple
example guarantees that Thread2 will read "tmp_a" == 1 if it
previously read "tmp_b" == 1, although getting "tmp_a" == 1 and
"tmp_b" != 1 is still possible.

Thread1:
a = 1;
write_barrier();
b = 1;

Thread2:
tmp_b = b;
read_barrier();
tmp_a = a;

I think there's a Documentation/memory-barriers.txt file in the kernel
source code with more helpful info.

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-27 23:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v39eez1ph.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 12:14:34PM -0700, Junio C Hamano wrote:

> > Minor nit, but this is just for diff, so it is not about clean/smudge
> > but rather about doing something like textconv.
> 
> I can understand if some tools in the Windows land prefer to work with
> these encodings, so clean/smudge to have the checkout in these encodings
> would be a reasonable thing not just diff but things like grep. On the
> other hand, I do doubt the sanity of these people if they want to have
> in-repository representation also in these encodings.

I'm pretty much of the same mind. We do have people with utf-16 in their
repositories on github. I have no idea why they do such a thing, or what
kinds of tricks they do to make it usable (because without it, they just
get "binary files differ").

My interest is to make things like bare-repository diff (and everything
built on it; i.e., things like github, gitweb, or whatever) do the sane
thing for these people, even if I think what they're doing is wrong. And
as always, I try to structure the git portions of that as much as
possible to be general and help everybody, so they can be pushed
upstream (also, then I don't have to worry about managing local changes
:) ).

But it sounds like this is probably just too ugly and should end up as a
github-specific thing.

-Peff

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-28  0:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <20111027234429.GA28187@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> My interest is to make things like bare-repository diff (and everything
> built on it; i.e., things like github, gitweb, or whatever) do the sane
> thing for these people, even if I think what they're doing is wrong.

I do not think we are talking about right or wrong. I was primarily saying
that textconv may not be the right thing (think github/gitweb showing blob
contents, nicely formatted inside the chrome the site provides).

The solution you suggested feels like a gross layering violation, unless
we do it everywhere, in which case I wouldn't mind too much.

We have in-repository representation that diff and grep and friends work
on, and output conversion layer that externalizes the result of them in
the form of "smudge". Another layer above the in-repository representation
and below operations could convert UTF-16 to UTF-8 when going outward and
in the opposite when going inward.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Jeff King @ 2011-10-28  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð
In-Reply-To: <7v1utyx9ri.fsf@alter.siamese.dyndns.org>

On Thu, Oct 27, 2011 at 05:03:29PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > My interest is to make things like bare-repository diff (and everything
> > built on it; i.e., things like github, gitweb, or whatever) do the sane
> > thing for these people, even if I think what they're doing is wrong.
> 
> I do not think we are talking about right or wrong. I was primarily saying
> that textconv may not be the right thing (think github/gitweb showing blob
> contents, nicely formatted inside the chrome the site provides).

But I think it is probably a wrong thing to store utf-16 as the
canonical format inside the git repository. Git simply can't handle it
for diffing. And the right thing, as you suggested, is clean/smudge.

But I'm dealing with repositories on the server side, where it is too
late to do clean/smudge; I just get whatever junk people commited.

> We have in-repository representation that diff and grep and friends work
> on, and output conversion layer that externalizes the result of them in
> the form of "smudge". Another layer above the in-repository representation
> and below operations could convert UTF-16 to UTF-8 when going outward and
> in the opposite when going inward.

I'm not sure that could sanely be done in a backwards compatible way.
Doing it with just textual diffs is a hack, of course, but at least we
know that the damage is limited, and the diff we generate on top doesn't
care that much about the original sha1s[1]. But should read_object_sha1
learn to convert utf-16 into utf-8? I think madness lies that way, as
we are breaking assumptions about sha1 validity.

-Peff

[1] Actually, the text diff does mention the original and resulting
sha1s, which would now either bear no relation to the diff text, or bear
no relation to what's in the repo. Either way, I think we are creating
something that can't necessarily be applied, which is bad. And is why I
thought of textconv, which is basically the same concept (and has the
same problems).

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28  0:24 UTC (permalink / raw)
  To: git
In-Reply-To: <7v8vo6xd4u.fsf@alter.siamese.dyndns.org>

On 10/28/2011 12:50 AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Gelonida N <gelonida@gmail.com> writes:
>>
>>> Is there any trick to get the help text of git branch without
>>> having to type
>>>
>>> git branch --help
>>
>> How about "git help branch"?
> 
> It was bad of me to write a tongue-in-cheek answer, get distracted and
> ended up sending it without the real answer.
No issues. At least I got an answer :-)
> 
> The reason why we do not do what you seem to be suggesting is because
> giving the same behaviour to "git b --help" as "git branch --help" is
> wrong.
> 
> To see why, imagine you have configured an alias that is not a simple and
> stupid substitution "b == branch", but something like "bt == branch -t",
> and then want to know what you should write after "git bt".  Giving the
> manpage for branch without giving them any hint that they configured that
> alias to produce customized behaviour that is different from plain vanilla
> "branch" is not quite acceptable.
> 
> I think you _could_ make a patch that special cases a simple and straight
> substitution and skip the "foo is aliased to bar" step, but I doubt it is
> worth it to lose consistency between "git b --help" vs "git bt --help"
> that way.


I understand the reasoning and agree, that as general case it might not
be a good idea to have the behaviour, taht I expected to be the default
behaviour

It's just, that I am lazy so I shortened  my most common commands

For example:
co=checkout
ci=commit
b=branch
sm=submodule

However I'm not smart enough to remember all options for all commands
and so I have to ask git for occasional help.


It feels so 'unlazy' to have to type the full command 'just' to get help


What I wondered though is whether there couldn't be a way to be lazy and
avoid confusion.

1.) prompt for full help
-------------------------

One option would be to display what the command is aliased to and to
prompt whether one wants to see the full help of the base command.

`git b' is aliased to `branch'
pleaset press h if you want to see the full help for git branch
or press q to quit


2.) allow copy paste by displaying the command to be typed for help.
---------------------------------------------------------------------
display sometihing like:

`git b' is aliased to `branch'
for help about the branch command type:  git --help


Then lazy people could copy paste the command instead of typing it.


3.) special alias syntax to allow 'forwarding' of --help
------------------------------------------------------------

some kind of special alias syntax, which explicitely forces that --help
is 'forwarded' to the alias and not treated by git


example:

b = branch   # --help would have the current behaviour

I tried following to see whether '--help' is now being 'forwarded'

b = ! git branch
but also  here --help is treated by git

A possibility could be another 'special' character. For the sake of the
example I used '?'
b = ?branch # 'the question mark would allow 'forwarding' the
             # --help option to the alias'

b = ?! git branch



My workaround
------------

Fort the time being I might do something like

bhlp = branch --help

I would have prefered b_hlp or b_h, but it seems underscore isn't
allowed in an alias.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Miles Bader @ 2011-10-28  1:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Nguyen Thai Ngoc Duy,
	Ævar Arnfjörð
In-Reply-To: <20111027234429.GA28187@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:
> We do have people with utf-16 in their repositories on github. I
> have no idea why they do such a thing, or what kinds of tricks they
> do to make it usable (because without it, they just get "binary
> files differ").

Hmm, you could ask them ...  [or, I suppose more diplomatically, post
a blog entry asking "Hey all you people who use github for utf-16
encoded files, ..."]

-Miles

-- 
Year, n. A period of three hundred and sixty-five disappointments.

^ permalink raw reply

* Re: git alias and --help
From: Miles Bader @ 2011-10-28  1:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Gelonida N, git
In-Reply-To: <7v8vo6xd4u.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:
>>> git branch --help
>>
>> How about "git help branch"?
>
> The reason why we do not do what you seem to be suggesting is because
> giving the same behaviour to "git b --help" as "git branch --help" is
> wrong.

I agree with Gelonida's followup:  although what you say makes sense,
it's still pretty annoying behavior for the very common case of a
simple renaming alias...

E.g., I have "co" aliased to "checkout", and so my fingers are very
very inclined to say "co" when I mean checkout... including when
asking for help.  I actually end up typing "git co --help", grumbling,
and retyping with the full command name, quite reguarly.

What I've often wished is that git's help system would output
something like:

   $ git help co
   `git co' is aliased to `checkout'

   Here's the help entry for `checkout':

   GIT-CHECKOUT(1)                   Git Manual                   GIT-CHECKOUT(1)

   NAME
          git-checkout - Checkout a branch or paths to the working tree
   ...

[with the "`git co' is aliased ..." header included in the pager
output.]

Of course, that would be the wrong thing for somebody that just wants
to be reminded what an alias expands too, but my intuition is that
this is a very tiny minority compared to people that want to examine
the options for the underlying command...

-Miles

-- 
Suburbia: where they tear out the trees and then name streets after them.

^ permalink raw reply

* Re: git alias and --help
From: Junio C Hamano @ 2011-10-28  4:05 UTC (permalink / raw)
  To: Miles Bader; +Cc: Gelonida N, git
In-Reply-To: <buoty6t9937.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> Of course, that would be the wrong thing for somebody that just wants
> to be reminded what an alias expands too, but my intuition is that
> this is a very tiny minority compared to people that want to examine
> the options for the underlying command...

And it is doubly wrong if help backend is configured to be anything but
manpages, no?

As I said, you should be able to come up with a patch that detects and
special cases the no frills case (replacement to single token) to get what
you want.

^ permalink raw reply

* Re: [PATCH 00/22] Refactor to accept NUL in commit messages
From: Junio C Hamano @ 2011-10-28  4:07 UTC (permalink / raw)
  To: Miles Bader
  Cc: Jeff King, git, Nguyen Thai Ngoc Duy,
	Ævar Arnfjörð
In-Reply-To: <buo39edao6r.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> Jeff King <peff@peff.net> writes:
>> We do have people with utf-16 in their repositories on github. I
>> have no idea why they do such a thing, or what kinds of tricks they
>> do to make it usable (because without it, they just get "binary
>> files differ").
>
> Hmm, you could ask them ...  [or, I suppose more diplomatically, post
> a blog entry asking "Hey all you people who use github for utf-16
> encoded files, ..."]

People will hate such a flag day event initially and later thank you for
it ;-) 

I am afraid that that is a wishful thinking, though.

^ permalink raw reply

* [RFC/PATCH] define the way new representation types are encoded in the pack
From: Junio C Hamano @ 2011-10-28  6:04 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre, Shawn O. Pearce, Jeff King

In addition to four basic types (commit, tree, blob and tag), the pack
stream can encode a few other "representation" types, such as REF_DELTA
and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
we do not have much room to add new representation types in place, but we
do have one value reserved for future expansion.

This patch is about defining how that reserved value is used.

The first byte in the pack stream data consists of the following for the
current representation types:

 - Bit 0-3 are used for the low 4-bit of "some" size (not necessarily the
   size of the representation);

 - Bit 4-6 are used for object types 0-7, but we have not used type 5 so
   far and reserved it for future expansion (we could also use type 0
   recorded in the pack stream for future expansion, just like how I
   convert 5 into the real "extended" representation type in this patch);

 - Bit 7 is used to signal if the second byte needs to be read for sizes
   that do not fit in the 4-bit.

When bit 4-6 encodes type 5, the first byte is used this way:

 - Bit 0-3 denotes the real "extended" representation type. Because types
   0-7 can already be encoded without using the extended format, we can
   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
   type 11 = 3 + 8);

 - Bit 4-6 has the value "5";

 - Bit 7 is used to signal if the _third_ byte needs to be read for larger
   size that cannot be represented with 8-bit.

As it is unlikely for us to pack things that do not need to record any
size, the second byte is always used in full to encode the low 8-bit of
the size.

I haven't started using type=8 and upwards for anything yet, but because
we have only one "future expansion" value left, I want us to be extremely
careful in order to avoid painting us into a corner that we cannot get out
of, so I am sending this out early for a preliminary review.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h     |    3 ++-
 sha1_file.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 2e6ad36..b02139b 100644
--- a/cache.h
+++ b/cache.h
@@ -380,9 +380,10 @@ enum object_type {
 	OBJ_TREE = 2,
 	OBJ_BLOB = 3,
 	OBJ_TAG = 4,
-	/* 5 for future expansion */
+	OBJ_EXT = 5, /* 5 for future expansion */
 	OBJ_OFS_DELTA = 6,
 	OBJ_REF_DELTA = 7,
+	OBJ_CAT_TREE = 8,
 	OBJ_ANY,
 	OBJ_MAX
 };
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..4dcd023 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1254,16 +1254,43 @@ static int experimental_loose_object(unsigned char *map)
 }
 
 unsigned long unpack_object_header_buffer(const unsigned char *buf,
-		unsigned long len, enum object_type *type, unsigned long *sizep)
+	unsigned long len, enum object_type *typep, unsigned long *sizep)
 {
 	unsigned shift;
 	unsigned long size, c;
 	unsigned long used = 0;
+	enum object_type type;
 
+	/*
+	 * MSB of the first byte is used to tell if the second byte
+	 * needs to be read for the size, so type field is only 3-bit
+	 * wide.
+	 */
 	c = buf[used++];
-	*type = (c >> 4) & 7;
-	size = c & 15;
-	shift = 4;
+	type = (c >> 4) & 7;
+
+	if (type != OBJ_EXT) {
+		/*
+		 * For basic types of object representations, the low
+		 * 4-bit of the first byte is used for the lowermost
+		 * 4-bit of the size. The MSB of the first byte tells
+		 * if the second byte needs to be read for size.
+		 */
+		size = c & 15;
+		shift = 4;
+	} else {
+		/*
+		 * For extended types, the low 4-bit of the first byte
+		 * is used for the representation type (offset by 8),
+		 * and the size begins at the second byte. The MSB of
+		 * the first byte is still used to indicate the next
+		 * byte (i.e. the third byte) needs to be read for the
+		 * size.
+		 */
+		type = (c & 15) + 8;
+		size = buf[used++];
+		shift = 8;
+	}
 	while (c & 0x80) {
 		if (len <= used || bitsizeof(long) <= shift) {
 			error("bad object header");
@@ -1274,6 +1301,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 		shift += 7;
 	}
 	*sizep = size;
+	*typep = type;
 	return used;
 }
 

^ permalink raw reply related

* Re: [PATCHv2 3/3] completion: match ctags symbol names in grep patterns
From: Jeff King @ 2011-10-28  6:05 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git
In-Reply-To: <20111023212928.GG22551@goldbirke>

On Sun, Oct 23, 2011 at 11:29:28PM +0200, SZEDER Gábor wrote:

> On Fri, Oct 21, 2011 at 01:30:21PM -0400, Jeff King wrote:
> > This incorporates the suggestions from Gábor's review, with one
> > exception: it still looks only in the current directory for the "tags"
> > files. I think that might have some performance implications, so I'd
> > rather add it separately, if at all.
> 
> I agree that scanning through a whole working tree for tags files
> would cost too much.  But I think that a tags file at the top of the
> working tree is common enough to be supported, and checking its
> existence is fairly cheap.

Actually, it's not too expensive. Asking git for the top of the working
tree means it has to traverse up to there anyway. So the trick is just
doing our search without invoking too many external tools which would
cause unnecessary forks.

The patch is below, but I'm still not sure it's a good idea.

Grep only looks in the current subdirectory for matches. So if I am in
"src/foo/bar/", and the tags file is in "src/", then is it really a good
way to generate completions? Most of what's in the tags file will
probably be in _other_ subdirectories of "src/", and won't be matched by
grep at all. So we will give useless entries to bash to complete.

> So how about something like this for the case arm? (I didn't actually
> tested it.)
> 
> 		local tagsfile
> 		if test -r tags; then
> 			tagsfile=tags
> 		else
> 			local dir="$(__gitdir)"

You don't want __gitdir here, but rather "git rev-parse --show-cdup".

> Btw, there is a bug in the case statement: 'git --no-pager grep <TAB>'
> offers refs instead of symbols, because $cword is not 2 and $prev
> doesn't start with a dash.  But it's not worse than the current
> behavior, so I don't think this bug is a show-stopper for the patch.

Yeah. The intent of the "$cword is 2" thing is to know that we are the
first non-option argument. Arguably, _get_comp_words_by_ref should
somehow tell us which position we are completing relative to the actual
command name.

-Peff

---
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index af283cb..b0ed657 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1429,6 +1429,39 @@ _git_gitk ()
 	_gitk
 }
 
+__git_cdup_dirs() {
+	local prefix=$(git rev-parse --show-cdup 2>/dev/null)
+	local oldifs=$IFS
+	local dots
+	local i
+	IFS=/
+	for i in $prefix; do
+		dots=$dots../
+		echo "$dots"
+	done
+	IFS=$oldifs
+}
+
+__git_find_in_cdup_one() {
+	local dir=$1; shift
+	for i in "$@"; do
+		if test -r "$dir$i"; then
+			echo "$dir$i"
+			return 0
+		fi
+	done
+	return 1
+}
+
+__git_find_in_cdup() {
+	__git_find_in_cdup_one "" "$@" && return
+
+	local dir
+	for dir in $(__git_cdup_dirs); do
+		__git_find_in_cdup_one "$dir" "$@" && return
+	done
+}
+
 __git_match_ctag() {
 	awk "/^${1////\\/}/ { print \$1 }" "$2"
 }
@@ -1457,8 +1490,9 @@ _git_grep ()
 
 	case "$cword,$prev" in
 	2,*|*,-*)
-		if test -r tags; then
-			__gitcomp "$(__git_match_ctag "$cur" tags)"
+		local tags=$(__git_find_in_cdup tags)
+		if test -n "$tags"; then
+			__gitcomp "$(__git_match_ctag "$cur" "$tags")"
 			return
 		fi
 		;;

^ permalink raw reply related

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Junio C Hamano @ 2011-10-28  6:12 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre, Shawn O. Pearce, Jeff King
In-Reply-To: <7v62j9veh3.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> I haven't started using type=8 and upwards for anything yet, but because
> we have only one "future expansion" value left, I want us to be extremely
> careful in order to avoid painting us into a corner that we cannot get out
> of, so I am sending this out early for a preliminary review.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  cache.h     |    3 ++-
>  sha1_file.c |   36 ++++++++++++++++++++++++++++++++----
>  2 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 2e6ad36..b02139b 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -380,9 +380,10 @@ enum object_type {
>  	OBJ_TREE = 2,
>  	OBJ_BLOB = 3,
>  	OBJ_TAG = 4,
> -	/* 5 for future expansion */
> +	OBJ_EXT = 5, /* 5 for future expansion */
>  	OBJ_OFS_DELTA = 6,
>  	OBJ_REF_DELTA = 7,
> +	OBJ_CAT_TREE = 8,
>  	OBJ_ANY,
>  	OBJ_MAX
>  };

As people may be able to guess from the name, CAT_TREE is envisioned to
encode a large data (primarily of type "blob") by recording the object
name of a tree object and probably the total length, and would represent
the concatenation of all blobs contained in the tree object when the tree
is traversed in some fixed order (e.g. Avery's "bup split"). I am guessing
that the payload for CAT_TREE representation type will be:

 - 20-byte object name for the top-level tree object;

 - type of the basic object (commit, tree, blob, or tag) it represents,
   even though it is unlikely that we would want to record such a large
   commit or tag that needs CAT_TREE representation;

 - the total length of the basic object it represents, even though it is
   redundant (you could traverse and sum the sizes of blobs contained in
   the tree object), it would help sha1_object_info() and friends. This
   will be the "some size" I mentioned in the previous message for this
   representation type.

We would probably add loose object representation for CAT_TREE, which may
look like:

    "cattree" <size of this cat-tree in decimal> NUL
    <basic object type> <size of the basic object> NUL
    <object name of the top-level tree>

and would need to teach unpack_sha1_file() about it. One caveat is that
we would want to keep the "contents name the object" invariant, so even if
a large blob is expressed as a CAT_TREE, its object name must still be
what we would get by hashing '"blob" <size> NUL <payload>'.

A loose object file in "cattree" representation will not hash to the value
a naïve implementation would expect, and fsck_sha1() needs to be aware of
it. I haven't thought things through in this area.

Further work would involve (no way exhaustive, of course):

 - Teach fsck and connectivity tools that objects that are reachable from
   any object (even a blob) that is represented as a CAT_TREE are needed
   and reachable by that object;

 - Teach pack-objects that anything that is represented as a CAT_TREE does
   not need to be deltified (the objects used as its representation would
   go through the usual deltification rules);

 - Teach unpack-objects to expand CAT_TREE representation into a "cattree"
   loose object.

 - Perhaps teach the attributes mechanism to lie to anybody who asks that
   any object in CAT_TREE representation is a binary file to trigger the
   "we do not unnecessarily look at binary" logic in "git diff" machinery.

 - Teach fast-import to write out CAT_TREE representation. This is
   probably the quickest and least-impact way to exploit existing support
   for large files in index_fd().

 - Update grep.c::grep_buffer() to take not <buf,size> but git_istream,
   and rewrite builtin/grep.c::grep_sha1() to use streaming interface.

^ permalink raw reply

* git rebase -p does not find mainline
From: David Kastrup @ 2011-10-28  7:51 UTC (permalink / raw)
  To: git

When doing git rebase -p in order to do a non-trivial rebase tracking
merges, I get a message about a missing -m option for specifying the
mainline.

git rebase does not have the corresponding option.  The fix is to do

git cherry-pick -m 1 offending-merge-commit
git rebase --continue

but that is quite unobvious from the resulting error message that does
not even mention "cherry-pick".  Since git rebase -p is supposed to
"preserve merges", it should just cherry-pick with the same mainline
that the original merge commit had, without asking the user questions or
even failing.

This is the version of git delivered with Ubuntu 11.10, namely 1.7.5.4.

-- 
David Kastrup

^ permalink raw reply

* Re: git alias and --help
From: Michael J Gruber @ 2011-10-28  9:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Gelonida N, git
In-Reply-To: <7vvcr9wyje.fsf@alter.siamese.dyndns.org>

Junio C Hamano venit, vidit, dixit 28.10.2011 06:05:
> Miles Bader <miles@gnu.org> writes:
> 
>> Of course, that would be the wrong thing for somebody that just wants
>> to be reminded what an alias expands too, but my intuition is that
>> this is a very tiny minority compared to people that want to examine
>> the options for the underlying command...
> 
> And it is doubly wrong if help backend is configured to be anything but
> manpages, no?
> 
> As I said, you should be able to come up with a patch that detects and
> special cases the no frills case (replacement to single token) to get what
> you want.

But "help" is still too much to type for the OP ;) How about this in
your config:

[alias]
	h = help
	hh = "!sh -c 'a=$(git config --get alias.$1); : ${a:=$1}; git help
${a%% *}' -"

Ugly as hell, I know, and works only for aliases whose first word is the
name of a git command, as well as for non-aliases. Catching "!command"
type aliases is left as an exercise to the reader.

Michael

^ permalink raw reply

* Re: [PATCH 2/2] pack-objects: optimize "recency order"
From: Ævar Arnfjörð Bjarmason @ 2011-10-28  9:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce, Dan McGee
In-Reply-To: <7vy5w6xf7n.fsf@alter.siamese.dyndns.org>

On Fri, Oct 28, 2011 at 00:05, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> We recently upgraded to 1.7.7 and I've traced a very large slowdown in
>> packing down to this commit.
>
> Does Dan McGee's series leading to 38d4deb (pack-objects: don't traverse
> objects unnecessarily, 2011-10-18) help?

Yes it does!

When I do:

    git cherry-pick 703f05ad5835cff92b12c29aecf8d724c8c847e2..38d4deb

Here's the time on the linux-2.6.git repack with that series:

    real    0m53.969s
    user    0m47.063s
    sys     0m3.020s

On the repository I was having troubles with this was the result on
master:

    Total 911023 (delta 687744), reused 905500 (delta 683064)

    real    6m0.487s
    user    4m1.751s
    sys     1m56.331s

And with the cherry-pick:

    Total 911023 (delta 687744), reused 911023 (delta 687744)

    real    1m44.192s
    user    0m32.169s
    sys     0m4.383s

And with 1b4bb16b9ec331c91e28d2e3e7dee5070534b6a2 reverted:

    Total 911023 (delta 687744), reused 911023 (delta 687744)

    real    1m23.796s
    user    0m31.931s
    sys     0m3.549s

So it's still slower, but not intolerably slower. I'd be interested to
find out why we have that 20% difference that doesn't show up on
linux-2.6.git though, the repository is mostly source code but there
are some binary blobs in it as well, although not very large, the
overall size of the entire repository is <500MB.

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28  9:17 UTC (permalink / raw)
  To: git
In-Reply-To: <7vvcr9wyje.fsf@alter.siamese.dyndns.org>

On 10/28/2011 06:05 AM, Junio C Hamano wrote:
> Miles Bader <miles@gnu.org> writes:
> 
>> Of course, that would be the wrong thing for somebody that just wants
>> to be reminded what an alias expands too, but my intuition is that
>> this is a very tiny minority compared to people that want to examine
>> the options for the underlying command...
> 
> And it is doubly wrong if help backend is configured to be anything but
> manpages, no?

Well. Then it is already doubly wrong today, isnt it?
The output of 'git help b'
is not a man page I assume.

In my opinion chaning it to a short message + the output of a man page
is not really worse than today's behaviour.

> 
> As I said, you should be able to come up with a patch that detects and
> special cases the no frills case (replacement to single token) to get what
> you want.

That's perhaps the most comfortable one, though occasionally I am even
interested in the help text of a git command if I had an alias like

l = log --name-status

to see what other switches I could add


(Thus an alternative suggestion to display both or to have a special
syntax allowing to force displaying of a man page)



Another small detail:

Let's assume I have following alias:

log = log --name-status


In this case I directly get the help text for git log
if I typed 'git log --help' (or 'git help log').
I don't even see, that my log is in reality aliased.

^ permalink raw reply

* [PATCH v3 00/14]  Tidying up references code
From: mhagger @ 2011-10-28 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips, Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

I have re-rolled this patch series against gitster/master for your
merging enjoyment.

Original description of patch series:

This is the next installment of the reference changes that I have been
working on.  This batch includes a lot of tidying up in preparation
for the real changes.

The last few patches have a little bit of meat on them.  They start
changing the innards of refs.c to work less with strings and more with
objects.  This work will continue in later patches with the ultimate
goal of swapping the data structure used to store cached references
out from under the module--changing it from a sorted array of pointers
into a hierarchical tree shaped like the reference namespace tree.

Michael Haggerty (14):
  cache.h: add comments for git_path() and git_path_submodule()
  struct ref_entry: document name member
  refs: rename "refname" variables
  refs: rename parameters result -> sha1
  clear_ref_array(): rename from free_ref_array()
  is_refname_available(): remove the "quiet" argument
  parse_ref_line(): add docstring
  add_ref(): add docstring
  is_dup_ref(): extract function from sort_ref_array()
  refs: change signatures of get_packed_refs() and get_loose_refs()
  get_ref_dir(): change signature
  resolve_gitlink_ref(): improve docstring
  Pass a (ref_cache *) to the resolve_gitlink_*() helper functions
  resolve_gitlink_ref_recursive(): change to work with struct ref_cache

 cache.h |   18 +++
 refs.c  |  434 +++++++++++++++++++++++++++++++++------------------------------
 refs.h  |   34 +++--
 3 files changed, 266 insertions(+), 220 deletions(-)

-- 
1.7.7

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox