Git development
 help / color / mirror / Atom feed
* Re: Combined diff format documentation
From: Junio C Hamano @ 2006-10-25 22:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <ehoo2k$1g6$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> 1. "git diff" header which looked like this
> 2. the "index" extended header line changes from
> 3. The "rename/copy" headers seems to be never present; see below.
>...

Thanks for starting this.  Your observation is correct.  It was
pretty much designed for quick _content_ inspection and renames
would work correctly to pick which blobs from each tree to
compare but otherwise not reflected in the output (the pathnames
are not shown as far as I know).  We could probably add it if
some users need it.

> 5. Hunk header is also modified: in ordinary diff we have
> ...
>    It might be not obvoious that we have (number of parents + 1) '@'
>    characters in chunk header for combined dif format.

Correct.  This was done to prevent people from accidentally
feeding it to "patch -p1".  In other words, we wanted to make it
so obvious that it is _not_ a patch.

There may be more information in "git log -- combine-diff.c"
output that ought to be collected into the documentation, and
now might be a good time to do so, given that that part of the
system is fairly stable and has not changed for quite some time
in git timescale.

>    BTW. it is not mentioned in documentation that git diff uses hunk section
>    indicator, and what regexp/expression it uses (and is it configurable).
>    Not described in documentation.

If you mean by "hunk section indicator" the output similar to
GNU diff -p option, I think it is not worth mentioning and we
are not ready to mention it yet (we have not etched the
expression in stone).  Nobody jumped up and down to say it needs
to be configurable, so it is left undocumented more or less
deliberately.

> 6. Documentation/diff-format.txt explains combined and condensed combined
>    format quite well, although it doesn't tell us if we can have plusses and
>    minuses together in one line...

But you already know the answer to that question, since you
asked me a few days ago ;-).

Patches to documentation would be easier to comment on and more
productive, I guess.

> Below there are following diffs: with first parent, merge (with all parents)
> with renames detection, combined, combined with rename detection. Is it all
> expected?

Yes.  I do not see anything obviously unexpected in your output.


^ permalink raw reply

* Re: VCS comparison table
From: Jakub Narebski @ 2006-10-25 22:29 UTC (permalink / raw)
  To: git
In-Reply-To: <20061025221531.GB10140@spearce.org>

Shawn Pearce wrote:

> David Lang <dlang@digitalinsight.com> wrote:
>> a quick lesson on program nameing
>> 
>> On Wed, 25 Oct 2006, Andreas Ericsson wrote:
>> 
>> >I'm personally all for a rewrite of the necessary commands in C ("commit" 
>> >comes to mind), but as many others, I have no personal interest in doing 
>> >the actual work. I'm fairly certain that once we get it working natively 
>> >on windows with some decent performance, windows hackers will pick up the 
>> >ball and write "wingit", which will be a log viewer and GUI thing for
>>              ^^^^^^
>> 
>> how many other people read this as 'wing it' rather then 'win git'? ;-)
> 
> Yes, that's certainly a less than optimal name...
> 
> What about gitk?  Is it "gi tk" or "git k" ?  This has actually
> been the source of much local debate.  :-)

You can always use CamelCase, i.e. WinGit or WinGIT (or wgit,
but this is also silly).

Cute names are taken: CoGITo, gitk, qgit (GTK+ history viewer is gitview,
not ggit, curiously ;-) and tig.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Combined diff format documentation
From: Jakub Narebski @ 2006-10-25 22:22 UTC (permalink / raw)
  To: git

In Documentation/diff-format.txt we can find the following information about
combined diff format:

 combined diff format
 --------------------
 
 git-diff-tree and git-diff-files can take '-c' or '--cc' option
 to produce 'combined diff', which looks like this: 

 ------------
 diff --combined describe.c
 @@@ +98,7 @@@
    return (a_date > b_date) ? -1 : (a_date == b_date) ? 0 : 1;
   }
 
 - static void describe(char *arg)
  -static void describe(struct commit *cmit, int last_one)
 ++static void describe(char *arg, int last_one)
   {
  +     unsigned char sha1[20];
  +     struct commit *cmit;
 ------------

And it further goes how to read combined diff format, and how --cc
(condensed combined) differs from --combined.

There is no note about which of extended headers functions with combined
diff format, how they change, how chunk header changes.

From what I gathered, there are the following differences as compared to
ordinary (diff --git) git extended headers:

1. "git diff" header which looked like this

      diff --git a/file1 b/file2

    is now

      diff --combined file2

    (where instead of --combined we might have --cc). Not described in
    documentation.

2. the "index" extended header line changes from

     index <hash>..<hash> <mode>

   to
   
     index <hash>,<hash>..<hash>

   Mode information is put in separate line, only if mode changes, for
   example

     mode <mode>,<mode>..<mode>

   <mode> can be 000000 if file didn't exist in particular parent; if file
   was cerated by merge we have

     new file mode <mode>

   I haven't checked what happens if file is deleted, either by branch or by
   merge commit itself. Not described in documentation, I'm not sure about
   how this (wrt modes) works.

3. The "rename/copy" headers seems to be never present; see below.

4. From file/to file header _seems_ to function exactly like in ordinary
   diff format, namely

     --- a/file1
     +++ b/file2

   But it seems to function rather like in ordinary "git diff" header,
   i.e. we have a/file1 instead of /dev/null even for files created by
   merge. I have not checked if and how rename detection work here.

5. Hunk header is also modified: in ordinary diff we have

     @@ <from range> <to range> @@

   where <from range> is -<start line>,<number of lines>, and <to range>
   is +<start line>,<number of lines>. In combined diff format it changes
   similarly to "index" extended header, namely

     @@@ <from range> <from range> <to range> @@@

   It might be not obvoious that we have (number of parents + 1) '@'
   characters in chunk header for combined dif format.

   BTW. it is not mentioned in documentation that git diff uses hunk section
   indicator, and what regexp/expression it uses (and is it configurable).
   Not described in documentation.

6. Documentation/diff-format.txt explains combined and condensed combined
   format quite well, although it doesn't tell us if we can have plusses and
   minuses together in one line...


=====================================================================

Combined diff format an renames detection
-----------------------------------------

We have the following situation:
$ git ls-tree -r --abbrev HEAD
100644 blob 1ce3f81     greetings/goodbye.txt
100644 blob 980a0d5     greetings/hello.txt
$ git ls-tree -r --abbrev HEAD^1
100644 blob 980a0d5     greetings/hello.txt
$ git ls-tree -r --abbrev HEAD^2
100644 blob 1ce3f81     data/goodbye.txt
100644 blob 980a0d5     data/hello.txt

Below there are following diffs: with first parent, merge (with all parents)
with renames detection, combined, combined with rename detection. Is it all
expected?

$ git diff-tree -p HEAD^1 HEAD
diff --git a/greetings/goodbye.txt b/greetings/goodbye.txt
new file mode 100644
index 0000000..1ce3f81
--- /dev/null
+++ b/greetings/goodbye.txt
@@ -0,0 +1 @@
+Goodbye World!

$ git diff-tree -p -M -m HEAD
d0fdd886e3b768678832c8d826bb8b70f2ef7b8e
diff --git a/greetings/goodbye.txt b/greetings/goodbye.txt
new file mode 100644
index 0000000..1ce3f81
--- /dev/null
+++ b/greetings/goodbye.txt
@@ -0,0 +1 @@
+Goodbye World!
d0fdd886e3b768678832c8d826bb8b70f2ef7b8e
diff --git a/data/goodbye.txt b/greetings/goodbye.txt
similarity index 100%
rename from data/goodbye.txt
rename to greetings/goodbye.txt
diff --git a/data/hello.txt b/greetings/hello.txt
similarity index 100%
rename from data/hello.txt
rename to greetings/hello.txt

$ git diff-tree -p -c HEAD
d0fdd886e3b768678832c8d826bb8b70f2ef7b8e
diff --combined greetings/goodbye.txt
index 0000000,0000000..1ce3f81
new file mode 100644
--- a/greetings/goodbye.txt
+++ b/greetings/goodbye.txt
@@@ -1,0 -1,0 +1,1 @@@
++Goodbye World!

$ git diff-tree -p -c -M HEAD
d0fdd886e3b768678832c8d826bb8b70f2ef7b8e
diff --combined greetings/goodbye.txt
index 0000000,1ce3f81..1ce3f81
mode 000000,100644..100644
--- a/greetings/goodbye.txt
+++ b/greetings/goodbye.txt
@@@ -1,0 -1,1 +1,1 @@@
+ Goodbye World!

And to compare, latest with --cc (condensed combined) instead of -c:
$ git diff-tree -p --cc -M HEAD
d0fdd886e3b768678832c8d826bb8b70f2ef7b8e
diff --cc greetings/goodbye.txt
index 0000000,1ce3f81..1ce3f81
mode 000000,100644..100644
--- a/greetings/goodbye.txt
+++ b/greetings/goodbye.txt
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply related

* Re: (unknown)
From: Junio C Hamano @ 2006-10-25 22:20 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <7v1wowm46j.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Andy Parkins <andyparkins@gmail.com> writes:
>
>> On Wednesday 2006, October 25 19:38, Junio C Hamano wrote:
>>
>>> > I did try that, but then the branches don't appear in git branch.  I
>>> > still like that they exist.
>>>
>>> "git branch -r" perhaps.
>>
>> That's pretty good.  It makes things like
>>
>>   git-log remotes/origin/master..master
>>
>> A bit long winded, but it's certainly what I asked for.
>
> "git log remotes/origin..master" perhaps?
>
> The point being, remotes/origin when origin is a directory that
> has HEAD that points at something, it stands for
> remotes/origin/HEAD.

Heh, I spoke too fast.

	"git log origin..master"

If you do not have none of .git/origin, .git/refs/origin,
.git/refs/tags/origin, .git/refs/heads/origin, nor
.git/refs/remotes/origin, then .git/refs/remotes/origin/HEAD is
what "origin" means (see get_sha1_basic() in sha1_name.c).



^ permalink raw reply

* Re: (unknown)
From: Shawn Pearce @ 2006-10-25 22:16 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200610252303.07900.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> wrote:
> On Wednesday 2006, October 25 19:38, Junio C Hamano wrote:
> 
> > > I did try that, but then the branches don't appear in git branch.  I
> > > still like that they exist.
> >
> > "git branch -r" perhaps.
> 
> That's pretty good.  It makes things like
> 
>   git-log remotes/origin/master..master
> 
> A bit long winded, but it's certainly what I asked for.
> 
> You guys really have thought of everything.

Try the bash completion support in contrib/completion.  If you
are using the bash shell it does branch name completions for most
commands, including both sides of the '..' in log there.  At this
point I can't use Git without it.

-- 

^ permalink raw reply

* Re: (unknown)
From: Junio C Hamano @ 2006-10-25 22:16 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200610252303.07900.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> On Wednesday 2006, October 25 19:38, Junio C Hamano wrote:
>
>> > I did try that, but then the branches don't appear in git branch.  I
>> > still like that they exist.
>>
>> "git branch -r" perhaps.
>
> That's pretty good.  It makes things like
>
>   git-log remotes/origin/master..master
>
> A bit long winded, but it's certainly what I asked for.

"git log remotes/origin..master" perhaps?

The point being, remotes/origin when origin is a directory that
has HEAD that points at something, it stands for
remotes/origin/HEAD.

^ permalink raw reply

* Re: VCS comparison table
From: Shawn Pearce @ 2006-10-25 22:15 UTC (permalink / raw)
  To: David Lang; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0610251450040.1754@qynat.qvtvafvgr.pbz>

David Lang <dlang@digitalinsight.com> wrote:
> a quick lesson on program nameing
> 
> On Wed, 25 Oct 2006, Andreas Ericsson wrote:
> 
> >I'm personally all for a rewrite of the necessary commands in C ("commit" 
> >comes to mind), but as many others, I have no personal interest in doing 
> >the actual work. I'm fairly certain that once we get it working natively 
> >on windows with some decent performance, windows hackers will pick up the 
> >ball and write "wingit", which will be a log viewer and GUI thing for
>              ^^^^^^
> 
> how many other people read this as 'wing it' rather then 'win git'? ;-)

Yes, that's certainly a less than optimal name...

What about gitk?  Is it "gi tk" or "git k" ?  This has actually
been the source of much local debate.  :-)

-- 

^ permalink raw reply

* Re: (unknown)
From: Andy Parkins @ 2006-10-25 22:03 UTC (permalink / raw)
  To: git
In-Reply-To: <7vods0b5rk.fsf@assigned-by-dhcp.cox.net>

On Wednesday 2006, October 25 19:38, Junio C Hamano wrote:

> > I did try that, but then the branches don't appear in git branch.  I
> > still like that they exist.
>
> "git branch -r" perhaps.

That's pretty good.  It makes things like

  git-log remotes/origin/master..master

A bit long winded, but it's certainly what I asked for.

You guys really have thought of everything.


Andy
-- 
Dr Andrew Parkins, M Eng (Hons), AMIEE

^ permalink raw reply

* Re: VCS comparison table
From: David Lang @ 2006-10-25 21:51 UTC (permalink / raw)
  To: Andreas Ericsson
  Cc: Jeff King, David Rientjes, Linus Torvalds, Lachlan Patrick,
	bazaar-ng, git
In-Reply-To: <453F6B7A.60805@op5.se>

a quick lesson on program nameing

On Wed, 25 Oct 2006, Andreas Ericsson wrote:

> I'm personally all for a rewrite of the necessary commands in C ("commit" 
> comes to mind), but as many others, I have no personal interest in doing the 
> actual work. I'm fairly certain that once we get it working natively on 
> windows with some decent performance, windows hackers will pick up the ball 
> and write "wingit", which will be a log viewer and GUI thing for
              ^^^^^^

how many other people read this as 'wing it' rather then 'win git'? ;-)

David Lang

^ permalink raw reply

* Re: VCS comparison table
From: Junio C Hamano @ 2006-10-25 21:50 UTC (permalink / raw)
  To: git
In-Reply-To: <7v3b9cnlx7.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

>  - Learn the itches David and other people have, that the
>    current git Porcelain-ish does not scratch well, and enrich
>    Documentation/technical with real-world working scripts built
>    around plumbing.

I meant "Documentation/howto"; sorry for the noise.

^ permalink raw reply

* Re: VCS comparison table
From: Junio C Hamano @ 2006-10-25 21:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20061025211618.GA30121@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Housing historical implementations seems like it would just lead to
> out-of-date and non-functional examples.

I agree.  Although that ought to be rare in principle, given
that one advertised feature of git is that the plumbing is
supposed to be stable, we occasionally had to have to subtly
break things to improve plumbing and at the same time run around
to make sure that all the script users (both in-tree and
out-of-tree like Cogito, gitweb and StGIT) are updated.

>>  - Learn the itches David and other people have, that the
>>    current git Porcelain-ish does not scratch well, and enrich
>>    Documentation/technical with real-world working scripts built
>>    around plumbing.
>
> I think this is a better approach. I think it also makes sense to
> let people know that it's an acceptable approach to start new features
> as shell and then have them mature to C (looking at the current
> codebase, and some of Dscho's rantings, one might get the impression
> that git isn't accepting new shell scripts).

New commands like pickaxe and for-each-ref were easier to code
in C, and cherry rewrite in C was really about how crufty the
shell script version was from the beginning (and there weren't
in-tree users of it left so it was not maintained at all but
thanks to plumbing being stable it just kept working perhaps
correctly but still horribly).

^ permalink raw reply

* Re: VCS comparison table
From: Jeff King @ 2006-10-25 21:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, bazaar-ng, Linus Torvalds, Lachlan Patrick, David Rientjes
In-Reply-To: <7v3b9cnlx7.fsf@assigned-by-dhcp.cox.net>

On Wed, Oct 25, 2006 at 02:08:20PM -0700, Junio C Hamano wrote:

> the older shell implementation around as reference.  People
> coming to git after 1.3 series certainly do have harder time to
> learn how plumbing would fit together than when git old-timers
> learned it, if that is the area they are interested in, as
> opposed to just using git as a revision tracking system.

I think this is part of the complication of discussion I'm having with
David. There are really two sets of users for git: people who want to
hack scripts based on plumbing, and people who want everything to "just
work." I think it's a good point that as the system matures (movement
to C and growth of complexity), it might become less easy to hack.

>  - Create examples/ hierarchy in the source tree to house these
>    historical implementations as a reference material, or an
>    entirely different branch or repository to house them.

Housing historical implementations seems like it would just lead to
out-of-date and non-functional examples.

>  - Learn the itches David and other people have, that the
>    current git Porcelain-ish does not scratch well, and enrich
>    Documentation/technical with real-world working scripts built
>    around plumbing.

I think this is a better approach. I think it also makes sense to
let people know that it's an acceptable approach to start new features
as shell and then have them mature to C (looking at the current
codebase, and some of Dscho's rantings, one might get the impression
that git isn't accepting new shell scripts).


^ permalink raw reply

* Re: VCS comparison table
From: Junio C Hamano @ 2006-10-25 21:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, David Rientjes, bazaar-ng, git
In-Reply-To: <20061025084810.GA26618@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Oct 24, 2006 at 01:12:52PM -0700, David Rientjes wrote:
>
>> And I would prefer the opposite because we're talking about git.  As an 
>> information manager, it should be seen and not heard.  Nobody is going to 
>> spend their time to become a git or CVS or perforce expert.  As an 
>> individual primarily interested in development, I should not be required 
>> to learn command lines for dozens of different git-specific commands to do 
>> my job quickly and effectively.  I would opt for a much more simpler 
>> approach and deal with shell scripting for many of these commands because 
>> I'm familiar with them and I can pipe any command with the options I 
>> already know and have used before to any other command.
>
> I don't understand how converting shell scripts to C has any impact
> whatsoever on the usage of git. The plumbing shell scripts didn't go
> away; you can still call them and they behave identically.
>
> Is there some specific change in functionality that you're lamenting?

That's also I wondered, but I also can understand where David is
coming from, and I agree with him to a certain degree.

When I learned git, I learned a lot from trying to piece my own
plumbing together, since there weren't much Porcelain to speak
of back then.  Then we had many usability enhancements before
the 1.0 release to add Porcelainish done as shell scripts.

This had two positive effects, aside from adding usability.
Interested people had more shell scripts to learn from.  The
scripts were easy to adjust to feature requests from the list,
and as we learned from user experience based on these scripts it
was definitely quicker to codify the best current practice
workflow in them than if they were written in C.  It would have
taken us a lot more effort to add "git commit -o paths" vs "git
commit -i paths" if it were already converted to C, for example.
This continued and our Porcelainish scripts matured quickly.

Then 1.3 series started to move some of the mature ones into C.
As many people already have pointed out, being written in C and
not doing pipe() has two advantages (better portability to
platforms with awkward pipe support and one less process usually
mean better performance).  git-log family with path limiting had
a real boost in performance because the path limiting can be
done in the revision traversal side not diff-tree that used to
be on the downstream side of the pipe.  So this in overall was a
right thing to do.

One thing we lost during the process, however, is a ready access
to the pool of "sample scripts" when people would want to
scratch their own itches.  Linus's original tutorial talked
about "this pattern of pipe is so useful that we have a three
liner shell script wrapper that is called git-foo", and
interested people can easily look at how the plumbing commands
fit together.

The plumbing is still there, and I and people who already know
git would still script around git-rev-list when we need to (by
the way, scripting around git-log is a wrong thing to do -- it
is for human consumption and scripting should be done with
plumbing).  But when we rewrote mature ones in C (and I keep
stressing "mature" because another thing I agree with David is
that shell is definitely easier to futz with), we did not leave
the older shell implementation around as reference.  People
coming to git after 1.3 series certainly do have harder time to
learn how plumbing would fit together than when git old-timers
learned it, if that is the area they are interested in, as
opposed to just using git as a revision tracking system.

We could probably do two things to address this issue:

 - Create examples/ hierarchy in the source tree to house these
   historical implementations as a reference material, or an
   entirely different branch or repository to house them.

 - Learn the itches David and other people have, that the
   current git Porcelain-ish does not scratch well, and enrich
   Documentation/technical with real-world working scripts built
   around plumbing.






^ permalink raw reply

* Re: VCS comparison table
From: Jeff King @ 2006-10-25 21:03 UTC (permalink / raw)
  To: David Rientjes; +Cc: Linus Torvalds, Lachlan Patrick, bazaar-ng, git
In-Reply-To: <Pine.LNX.4.64N.0610250954380.31053@attu2.cs.washington.edu>

On Wed, Oct 25, 2006 at 10:21:42AM -0700, David Rientjes wrote:

> Yes, it does.  I'll give you an example from six months ago: there was a 

First off, thanks for giving examples. I was having trouble seeing where
you were coming from.

> need for the group that I work with to support a faster type of hashing 
> function for whatever reason.  This would have been simple with previous 
> versions of git, but if you've ever looked at the SHA1 code in git, you'll 
> realize that you're probably better off never trying to touch it.  There 
> is absolutely _no_ abstraction of it at all and the code is so deeply 
> coupled in the source that abstracting it away is a pain.

Is this really an artifact of the C code versus the shell code? A lot of
parts of the system need to touch SHA1 hashes, and I think it has been
sprinkled throughout the code from the beginning. In fact, I think the
libification of git-rev-list has made the code a lot _cleaner_ (and
shorter), in that the C programs can all use the same nice interface.
The external interface is still there, but now there is consistency
among programs when using rev syntax (ISTR issues in the distant past
where program X didn't understand syntax because the parsing was all
done ad-hoc).

> Likewise, there is always room for personal or organizational tweaks on 
> the part of the developer.  Things like distributed pulling and 
> merging should actually be pretty simple to implement if the complexity 
> wasn't so high in the merge-* family.  This is something I implemented 
> after an enormous headache because we were dealing with very large 
> projects: yes, larger than the Linux kernel.  And this is _exactly_ where 
> piping would help; we have implementations of distributed grep over very 
> large datasets (on the order of terabytes).

I guess I don't see how this was ever any easier. Do you mean that when
we called an external grep, it was easier to plug in your distributed
grep?

> > You can do the same thing in C. In fact, look at how similar
> > git-whatchanged, git-log, and git-diff are.
> No you can't.


The "same thing" I referred to was changing behavior trivially based on
the program name. So yes, you can.

> Making a one line addition, commenting out a line, or changing a
> simple flag in a shell script is much easier.  And like I already

Sure, shell can be easier to modify (though in well-written C, you're
likely just commenting out a few lines or a function call -- maybe you
can argue whether or not git is well-written). However, I remain
unconvinced that this is a common use case, or that it is something that
should weigh heavily when compared with portability, efficiency, or
robustness concerns.

> It's not, it's related to the original vision of git which was meant for 
> efficiency and simplicity.


Simplicity is fine if all you want is plumbing. But normal people want
to _use_ git without hacking their own shell scripts, so it makes sense
to provide the scripts that other people have hacked together (as shell,
perl, C, or whatever). Do I want to use git-send-email? Hell no, the
interface is terrible to me. But do the plumbing commands still exist so
that I can use the scripts I hacked together? Absolutely. I can take
what I want and leave the rest.

> A year ago it was very easy to pick up the package and start using it
> effectively within a couple hours.  Keep in mind that this was without

Was it? The most common complaint I've heard about git, starting a year
ago, was the lack of documentation and tutorials and the complexity of
use.

> tutorials, it was just reading man pages.  Today it would be very
> difficult to know what the essential commands are and how to use them
> simply to get the job done, unless you use the tutorials.  This

I think this has been the case for a long time. It's just that there
_weren't_ tutorials back then.

> Have you never tried to show other people git without giving them a 
> tutorial on the most common uses?  Try it and you'll see the confusion.  
> That _specifically_ illustrates the ever-increasing lack of simplicity 
> that git has acquired.

No, it illustrates a lack of simplicity that currently exists; it says
_nothing_ about the change in simplicity over time.

> There are _not_ scalability improvements.  There may be some slight 
> performance improvements, but definitely not scalability.  If you have 
> ever tried to use git to manage terabytes of data, you will see this 

There has been work on scaling to larger repositories (e.g., mozilla and
xorg prompting work/discussion on cvs importing, subproject/superproject
support, shallow clones, etc), but not on terabyte scales. I realize
that might not help you, but it is helping a lot of people. Quite
honestly, git is focused on SOURCE CODE MANAGEMENT, not terabytes of
data. Perhaps that is your true complaint: git is developing tools for
working with source code, potentially at the loss of some generality
(though I tend to think it hasn't lost generality, but rather it hasn't
gained).

> becomes very clear.  And "rebasing with 3-way merge" is not something 
> often used in industry anyway if you've followed the more common models 
> for revision control within large companies with thousands of engineers.  
> Typically they all work off mainline.

My point isn't that every feature is useful to every developer. My point
is that just because features aren't useful to _you_ doesn't mean
they're not useful at all.

And if you want to talk about industry standard, didn't the discussion
start off with your complaint about porting to Windows? An
industry-standard SCM needs to be cross-platform across the major
operating systems.

> Few months back here on the mailing list.  When I tried cleaning up even 
> one program, I got the response back from the original author "why fix a 
> non-problem?" because his argument was that since it worked the code 
> doesn't matter.

I remember a big discussion about the order of arguments in relational
expressions. Git may have problems, but I just don't see coding style
nitpicks as a priority.

Abstracting the hashing might be worthwhile, but the list consensus was
that it's not worth the work unless we're actually going to _do_
something with the abstraction.  Your argument seems to be that you
_are_ doing something with the abstraction on your own. If you want to
convince the git developers that this is a worthwhile direction, then
show some code which uses it.

> 	http://marc.theaimsgroup.com/?l=git&m=115589472706036

OK, I remember this particular discussion. And I just read through to
the end of the thread; it looks like Junio ended up with "this code is
ugly; fix it" and Johannes did.

It sounds like your real beef was that you want to use some alternate
"mv" command that handles your data set better, and having git-mv as a
shell-script would make that simpler for you.  Well, it isn't a shell
script and it never was. If you want to write it as one, I imagine it
would be considered for inclusion (though I expect the C version may
have some advantages, such as atomicity of file movement and index
updating).


^ permalink raw reply

* Re: [PATCH] document the <tree ish> <file> blob reference syntax
From: Junio C Hamano @ 2006-10-25 20:13 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: git
In-Reply-To: <453FBDAA.50305@shadowen.org>

Andy Whitcroft <apw@shadowen.org> writes:

>> +For a more complete list of ways to spell object names, see
>> +"SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1].
>> +
>
> That section seems to have more compresensive descriptions of the
> various definitions of commit-ish, but not a tree-ish.  Specifically,
> there is no mentions of tree-ish:file

  164 * A suffix ':' followed by a path; this names the blob or tree
  165   at the given path in the tree-ish object named by the part
  166   before the colon.


^ permalink raw reply

* Re: [PATCH] document the <tree ish> <file> blob reference syntax
From: Jakub Narebski @ 2006-10-25 20:04 UTC (permalink / raw)
  To: git
In-Reply-To: <7vwt6ob5zc.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Andy Whitcroft <apw@shadowen.org> writes:
> 
>> It is possible to specify a specific file within a tree-ish
>> symbolically.  For example you can find the contents of
>> a specific file in a specific commit as below:
>>
>>       git cat-file -p v1.2.4:git-prune.sh
> 
> Didn't we document this elsewhere recently in git-rev-parse?
> How about this instead?

Redundancy in documentation is (usually) good idea. Perhaps
both?

P.S. "recently" as in "Thu Oct 19 10:04:55 2006 +0700" in 'master',
commit v1.4.3-g6b09c78
-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: VCS comparison table
From: Junio C Hamano @ 2006-10-25 19:53 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: git
In-Reply-To: <453F41DE.6090405@op5.se>

Andreas Ericsson <ae@op5.se> writes:

> See the attached screenshot. This is from qgit --all on the git
> repository, but the DAG output is identical to that of gitk. Note in
> particular the 'pu' and 'next' branches. By scrolling down, I can
> easily see the branch-point of any of them.

Looking at this picture I noticed the lack of circles or
rectangles on six commits near the tip of "pu" branch.  Nobody
should be doing an Octopus so it might be a non-issue, but
somehow it looks fishy.


^ permalink raw reply

* Re: [PATCH] document the <tree ish> <file> blob reference syntax
From: Andy Whitcroft @ 2006-10-25 19:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwt6ob5zc.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Andy Whitcroft <apw@shadowen.org> writes:
> 
>> It is possible to specify a specific file within a tree-ish
>> symbolically.  For example you can find the contents of
>> a specific file in a specific commit as below:
>>
>> 	git cat-file -p v1.2.4:git-prune.sh
> 
> Didn't we document this elsewhere recently in git-rev-parse?
> How about this instead?
> 
> -- >8 --
> [PATCH] Refer to git-rev-parse:Specifying Revisions from git.txt
> 
> The brief list given in "Symbolic Identifiers" section of the
> main documentation is good enough for overview, but help the
> reader to find a more comrehensive list as needed.
> 
> Signed-off-by: Junio C Hamano <junkio@cox.net>
> ---
>  Documentation/git.txt |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 3af6fc6..b00607e 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -562,6 +562,9 @@ HEAD::
>  	a valid head 'name'
>  	(i.e. the contents of `$GIT_DIR/refs/heads/<head>`).
>  
> +For a more complete list of ways to spell object names, see
> +"SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1].
> +

That section seems to have more compresensive descriptions of the
various definitions of commit-ish, but not a tree-ish.  Specifically,
there is no mentions of tree-ish:file


^ permalink raw reply

* Re: updating only changed files source directory?
From: Daniel Barkalow @ 2006-10-25 19:35 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git
In-Reply-To: <453F517A.7060000@xs4all.nl>

On Wed, 25 Oct 2006, Han-Wen Nienhuys wrote:

> How can I set the object database?  I found GIT_OBJECT_DIRECTORY, but can I
> write a config file entry for that?

If you clone with --shared, it'll do the right thing automatically, which 
is to have the clone's .git/objects/info/alternates be the objects 
directory of the bare repository.

(Note that any new objects you create in the clone go into the clone's own 
objects database. This shouldn't matter for you, unless your build system 
is tagging things or something, but if you end up doing development in a 
similarly structured system, it's worth knowing that this doesn't affect 
the bare repository at all.)

> yes, this works. Thanks!

No problem. :)

	-Daniel

^ permalink raw reply

* Re: an option to make "git-diff Z A" prints Z's diff before A's
From: Junio C Hamano @ 2006-10-25 19:16 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git
In-Reply-To: <871wowzx15.fsf@rho.meyering.net>

Jim Meyering <jim@meyering.net> writes:

> In a recent patch set I prepared, I placed the names of the
> more relevant files at the front of the list given to "git-diff".
>...
> I know about the -O<orderfile> option, and it can make git-diff do
> what I want, but only if I first create a separate file containing
> the names that I'm already providing to git-diff in the very same order.
>
> Is there an easier way?

No, not right now.

> If not, would you be amenable to a new option enabling this behavior
> without requiring a temporary file?

The thing is, "git diff -- Z A" does *not* mean:

	I know I have a file called Z and a file called A;
	please give diff for these files.

What it means is:

	Please give me the diff as usual, but I care about paths
	that match these patterns, Z or A.

So "git diff -- Documentation" names all changed files in that
directory; you could also spell it "Documentation/" for clarity.

git-diff traverses two tree-like things (either tree-vs-tree,
tree-vs-index, or tree-vs-working tree) in parallel in the
canonical order, but skips comparing paths that do not match the
list of patterns you gave on the command line.  While it does
so, we do not record which pattern caused the path to be
included in the output anywhere, so there currently is no way to
tell which ones matched an earlier pattern and which ones
matched a later one.

If somebody wants to do this, the place to modify would be the
following:

 - add a new parameter, "int match_number", to change_fn_t and
   add_remove_fn_t functions, and add a new member to struct
   diff_filepair to record it.

 - update all callers of diff_addremove, diff_change, and
   diff_unmerge to pass which pathspec the user gave on the
   command line matched the path to be included (in your example
   if both Z and A were directory, file Z/foo gets number 1 and
   file A/bar gets number 2).

 - update diff_addremove, diff_change and diff_unmerge to pass
   that match_number to diff_queue(), and make diff_queue()
   record the number in the new diff_filepair it creates.

 - in places where an existing filepair is split into two and
   two existing filepairs are merged into one (e.g. "break" and
   "rename"), make sure match_number is propagated sensibly from
   the original filepairs to the modified ones.

 - in diffcore_std(), if orderfile is not in use, use the
   match_number to sort the queued filepairs.


^ permalink raw reply

* Re: Bugreport: core-tutorial example outdated?
From: Clemens Koller @ 2006-10-25 19:08 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0610251923020.3286@wbgn013.biozentrum.uni-wuerzburg.de>

Hello, Dscho!

>>I just studied
>>http://www.kernel.org/pub/software/scm/git/docs/core-tutorial.html
> 
> Did you actually add a file with the content "Hello World\n"? If not, you 
> should not be surprised.

Argh... yes, I even adopted my numbers to my case, but I just didn't
include the /55/... 2 letter folder name in the /55/7db03de997 object names... :-(

$ ls .git/objects/??/*
.git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238
.git/objects/f2/4c74a2e500f5ee1332c86b94199f52b1d1d962
$ git-cat-file -t 7db03de997           <- wrong!
fatal: Not a valid object name 7db03de997
$ git-cat-file -t 557db03de            <- correct!
blob

Jup, those little details... once you don't stick to any tags.
Fine! Thanks! The tutorial, too!

Best greets,

Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm.de
Phone: +49-89-741518-50

^ permalink raw reply

* Re: [PATCH] git-fetch.sh printed protocol fix
From: Junio C Hamano @ 2006-10-25 18:52 UTC (permalink / raw)
  To: Tuncer Ayaz; +Cc: git
In-Reply-To: <4ac8254d0610250303n60a6006bsa4d77aba7255485f@mail.gmail.com>

"Tuncer Ayaz" <tuncer.ayaz@gmail.com> writes:

> As a feature I wished for (ftp:// support in git-fetch) was added in 1.4.3
> I tested that feature and found a minor logging issue. The mini-patch
> below fixes that.  AFAIK the pattern expansion feature I've used should
> be available in any current /bin/sh. If not we will have to find another
> way to print the protocol part of the used fetch URL.

Yes, we also have supported https:// that way for a long time.

> --- git-core-1.4.3.2/git-fetch.sh	2006-10-24 07:29:47.000000000 +0200
> +++ git-core-1.4.3.2.tma/git-fetch.sh	2006-10-25 11:44:34.000000000 +0200
> @@ -310,7 +310,7 @@
> 	  done
>  	  expr "z$head" : "z$_x40\$" >/dev/null ||
> 	      die "Failed to fetch $remote_name from $remote"
> -	  echo >&2 Fetching "$remote_name from $remote" using http
> +	  echo >&2 Fetching "$remote_name from $remote" using ${remote%%:*}
> 	  git-http-fetch -v -a "$head" "$remote/" || exit
> 	  ;;
>       rsync://*)

As you noticed, we stayed away from using ${parameter#word} or
${parameter%word} substitutions so far, to be as compatible with
vanilla shells as possible (I know even dash which is pretty
much the most minimal supports it -- the syntax is in POSIX).  I
am a bit reluctant to take this implementation right now.  We
tend to use colon-form of "expr" for things like this.

It might make sense to do a survey of userbase at some point to
see if everybody's shell that works with the current set of
scripts understands the substring substitution, and after
finding it out switch many invocations of expr to substring
substitutions.

For now I'd take the patch but change it to match others to use
expr.

Next time around, please sign your patch.

-- >8 -- 
From: Tuncer Ayaz <tuncer.ayaz@gmail.com>
Date: Wed, 25 Oct 2006 12:03:06 +0200
Subject: [PATCH] git-fetch.sh printed protocol fix

We have supported https:// protocol for some time and in 1.4.3
added ftp:// protocol.  The transfer were still reported to be
over http.

[jc: Tuncer used substring parameter substitution ${remote%%:*}
 but I am deferring it to a later day.  We should replace
 colon-expr with substring substitution after everybody's shell
 can grok it someday, but we are not in a hurry. ]

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 git-fetch.sh |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/git-fetch.sh b/git-fetch.sh
index 79222fb..9eedf8b 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -287,6 +287,7 @@ fetch_main () {
       # There are transports that can fetch only one head at a time...
       case "$remote" in
       http://* | https://* | ftp://*)
+	  proto=`expr "$remote" : '\([^:]*\):'`
 	  if [ -n "$GIT_SSL_NO_VERIFY" ]; then
 	      curl_extra_args="-k"
 	  fi
@@ -310,7 +311,7 @@ fetch_main () {
 	  done
 	  expr "z$head" : "z$_x40\$" >/dev/null ||
 	      die "Failed to fetch $remote_name from $remote"
-	  echo >&2 Fetching "$remote_name from $remote" using http
+	  echo >&2 "Fetching $remote_name from $remote using $proto"
 	  git-http-fetch -v -a "$head" "$remote/" || exit
 	  ;;
       rsync://*)
-- 
1.4.3.2.gc1a4


^ permalink raw reply related

* Re: Question about commit message conventions
From: Junio C Hamano @ 2006-10-25 18:48 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: git
In-Reply-To: <453F8187.9060208@op5.se>

Andreas Ericsson <ae@op5.se> writes:

> If you sift through the Linux kernel, you will find numerous patches
> where subsystem maintainers have acked patches sent to them. I *think*
> this usually means that they have reviewed the patch and approve of
> it, but not modified it. The Ack is then solely for Linus' benefits
> and tells him that at least one pair of eyes have already gone over
> the patch.

Correct.

> Subsys maintainers sometimes also add Signed-off-by: lines, which I
> assume means they have tweaked the patch somewhat or somehow
> collaborated with the author in producing it.
>...
> Lots of guesswork here, but in a sane world I can't be too far off the
> mark ;-)

Documentation/SubmittingPatches makes it unnecessary to make any
guesses on S-o-b lines.  Regarding subsystem maintainer
sign-offs, you are referring to DCO 1.1 (b), but the signature
could have been made under DCO 1.1 (c).

In plain terms, the signer vouches that the patch was passed
either intact or with modifications but the original and
modifications are both releasable, to the best of signer's
knowledge, under open source terms.


^ permalink raw reply

* Re: [PATCH] Minor grammar fixes for git-diff-index.txt
From: Junio C Hamano @ 2006-10-25 18:42 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200610251602.42433.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> From 9f5b5b3d4925ac5f22a64fd075c50417cff7b496 Mon Sep 17 00:00:00 2001
> From: Andy Parkins <andyparkins@gmail.com>
> Date: Wed, 25 Oct 2006 15:59:53 +0100
> Subject: [PATCH] Minor grammar fixes for git-diff-index.txt
> To: git@vger.kernel.org

We do not want these in-body.

>
> "what you are going to commit is" doesn't need the "is" and does need a comma.
>
> "can trivially see" is an unecessary split infinitive and "easily" is a more
> appropriate adverb.
> Signed-off-by: Andy Parkins <andyparkins@gmail.com>
> ---
> This corrects the previous grammar patch - the original use of "where" was 
> correct.  You know when you say a word enough and it loses all meaning...
>
>  Documentation/git-diff-index.txt |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-diff-index.txt 
> b/Documentation/git-diff-index.txt
> index 9cd43f1..2df581c 100644
> --- a/Documentation/git-diff-index.txt
> +++ b/Documentation/git-diff-index.txt
> @@ -54,7 +54,7 @@ If '--cached' is specified, it allows yo
>  
>  For example, let's say that you have worked on your working directory, 
> updated

Please check who wrapped this line and correct it.  Most likely
your MUA.

^ permalink raw reply

* Re: VCS comparison table
From: Aaron Bentley @ 2006-10-25 18:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Erik Bågfors, bazaar-ng, git, Jakub Narebski
In-Reply-To: <Pine.LNX.4.64.0610231623340.3962@g5.osdl.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:
> 
> On Tue, 24 Oct 2006, Erik Bågfors wrote:
> 
>>I don't see any problem doing a "gitk --all" equivalent in bzr.
> 
> 
> The problem? How do you show a commit that is _common_ to two branches, 
> but has different revision names in them?

If you're talking about the old-style single-integer revnos, each
revision only has one of those, because that revision dictates the path
you must take to the origin when determining its revno.  Many others may
share that revno, but each revision has only one.

The new-style dotted-series-of-ints revnos, I agree, will change.
They're not something I use.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFFP6/B0F+nu1YWqI0RAs76AJ9nE4BnL2tLDPQwqjQvCi6okDTdpQCdFQ9V
GoL1BWO+L2FxjLjRrCjKtuY=
=yQ6t

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox