Git development

Git development
 help / color / mirror / Atom feed

* Recent unresolved issues
From: Junio C Hamano @ 2006-04-14  9:31 UTC (permalink / raw)
  To: git

Here is a list of topics in the recent git traffic that I feel
inadequately addressed.  I've commented on some of them to give
people a feel for what my priorities are.  Somebody might want
to rehash the ones low on my priority list to conclusion with a
concrete proposal if they cared about them enough.  The list is
*not* ordered in any way.

Also please add whatever I missed (or dismissed).  I am hoping
this will be a good basis for 1.4 to-do list.

* Message-ID: <Pine.LNX.4.64.0604121828370.14565@g5.osdl.org>
  Common option parsing (Linus Torvalds)

* Message-ID: <Pine.LNX.4.64.0604050855080.2550@localhost.localdomain>
  Binary diff output? (Nicolas Pitre)

  I do not think this is needed for our primary audience (the
  kernel project), but I am sure it would be helpful for some
  other projects if we allowed them to exchange patches that
  describe binary file changes via e-mail, so I am not
  dismissing this.  Needs to wait "option parsing".

* Message-ID: <Pine.LNX.4.64.0604111725590.14565@g5.osdl.org>
  Colored diff? (Linus Torvalds)

  I am not opposed to it, but I'd like to do that internally if
  we go this route.  Needs to wait "option parsing".  Also
  Message-ID: <3536.10.10.10.24.1114117965.squirrel@linux1> is
  slightly related to this.

* Message-ID: <7vek02ynif.fsf@assigned-by-dhcp.cox.net>
  diff --with-raw, --with-stat? (me)

  I think "git diff" can be internalized next, after "option
  parsing" unification.  When that is done, --with-stat would
  help internalize format-patch's process_one(), and it would be
  trivial to do "git log --pretty=format-patch master..next".

* #irc 2006-04-10
  Shallow clones (Carl Worth).

  The experiment last round did not work out very well, but as
  existing repositories get bigger, and more projects being
  migrated from foreign SCM systems, this would become a
  must-have from would-be-nice-to-have.

  I am beginning to think using "graft" to cauterize history
  for this, while it technically would work, would not be so
  helpful to users, so the design needs to be worked out again.

* Message-ID: <E1FMH3o-0001B5-Dw@jdl.com>
  git status does not distinguish contents changes and mode
  changes; it just says "modified" (Jon Loeliger).

  Unconditionally changing the status letter would break
  Porcelains so we would need an extra option to do this.
  An outline patch has been already prepared -- this perhaps has
  to wait until we sort out the "option parsing" one.

* Message-ID: <tnxmzf9sh7k.fsf@arm.com>
  git could use diff3 instead of merge which is a wrapper around
  diff3. (Catalin Marinas)

  If having "diff3" is a lot more common than having "merge", I
  do not have problem with this; "merge" being a wrapper to
  "diff3", people who have been happy with the current code
  would certainly have "diff3" installed so changing to "diff3"
  would not break them.

* Message-ID: <81b0412b0603020649u99a2035i3b8adde8ddce9410@mail.gmail.com>
  Windows problems summary (Alex Riesen)

  A good list to keep in mind.

* Message-ID: <Pine.LNX.4.64.0604030730040.3781@g5.osdl.org>
  Huge packfiles (Linus Torvalds)

  Because I do not think asking users to break up packs to
  manageable and mmap()able size is too much to ask, I would not
  be advocating for updating the pack idx to 64-bit offset and
  mmap()ing parts of a packfile, at least too strongly.

  However, we currently lack tool support or recepe for users
  with such a repository to easily break up packs.

* Message-ID: <1143856098.3555.48.camel@dv>
  Per branch property, esp. where to merge from (Pavel Roskin)

  This involves user-level "world model" design, which is more
  Porcelainish than Plumbing, and as people know I do not do
  Porcelain well; interested parties need to come up with what
  they want and how they want to use it.

^ permalink raw reply

* Re: Solaris test t5500 race condition
From: Peter Eriksen @ 2006-04-14 11:53 UTC (permalink / raw)
  To: git
In-Reply-To: <7vhd4wvhyq.fsf@assigned-by-dhcp.cox.net>

On Thu, Apr 13, 2006 at 10:34:05PM -0700, Junio C Hamano wrote:
> "Peter Eriksen" <s022018@student.dtu.dk> writes:
> 
> >     Generating pack...
> >     Done counting 3 objects.
> >     Deltifying 3 objects.
> >       33% (1/3) done^M  66% (2/3) done^M 100% (3/3) done
> >     Total 3Unpacking , written 33 objects          <------------
> >      (delta 0), reused 0 (delta 0)
> >     11fa2f0cb58ed7f02dbd5ac75ed82a53fae62a7b refs/heads/A
> 
> Hmph.  Not good.  Before the writer managed to flush the report
> the reader has already decoded the header and reports the number
> of objects it is going to unpack.
...
> -- >8 --
> [PATCH] t5500: test fix

With the patch it doesn't complain anymore.  There are many other 
problems with the tests on Solaris though.

Peter

^ permalink raw reply

* Re: Recent unresolved issues
From: Petr Baudis @ 2006-04-14 16:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v64lcqz9j.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Fri, Apr 14, 2006 at 11:31:36AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Here is a list of topics in the recent git traffic that I feel
> inadequately addressed.  I've commented on some of them to give
> people a feel for what my priorities are.  Somebody might want
> to rehash the ones low on my priority list to conclusion with a
> concrete proposal if they cared about them enough.  The list is
> *not* ordered in any way.

Nice summary!

> * Message-ID: <tnxmzf9sh7k.fsf@arm.com>
>   git could use diff3 instead of merge which is a wrapper around
>   diff3. (Catalin Marinas)
> 
>   If having "diff3" is a lot more common than having "merge", I
>   do not have problem with this; "merge" being a wrapper to
>   "diff3", people who have been happy with the current code
>   would certainly have "diff3" installed so changing to "diff3"
>   would not break them.

I've decided to bite the bullet and made Cogito use diff3 instead of
merge as of now. Let's see if anybody complains...

> * Message-ID: <1143856098.3555.48.camel@dv>
>   Per branch property, esp. where to merge from (Pavel Roskin)
> 
>   This involves user-level "world model" design, which is more
>   Porcelainish than Plumbing, and as people know I do not do
>   Porcelain well; interested parties need to come up with what
>   they want and how they want to use it.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Default remote branch for local branch
From: Petr Baudis @ 2006-04-14 16:16 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Pavel Roskin, Junio C Hamano, git
In-Reply-To: <200604021817.30222.Josef.Weidendorfer@gmx.de>

Dear diary, on Sun, Apr 02, 2006 at 06:17:29PM CEST, I got a letter
where Josef Weidendorfer <Josef.Weidendorfer@gmx.de> said that...
> > I would write the config like this:
> > 
> > [branch-upstream]
> > master = linus
> > ata-irq-pio = irq-pio
> > ata-pata = pata-drivers
> 
> That is not working, as said above. But with above syntax extension,
> with s/=/for/ it would be fine.

I'm sorry but I'm slow and I don't see it - why wouldn't this work?
(Except that the key name is case insensitive, which isn't too big a
deal IMHO.)

I for one think that the 'for'-syntax is insane - it's unreadable (your
primary query is by far most likely to be "what's the upstream when on
branch X", not "what branches is this upstream for"), would convolute
the configuration file syntax unnecessarily and would possibly also
complicate the git-repo-config interface. Pavel's syntax is much nicer.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* git-stripspace breakage
From: Linus Torvalds @ 2006-04-14 16:40 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List

Junio,
 the current git-stripspace leaves extra newlines at the end, causing ugly 
commit logs in "git log". I assume/suspect that it's the recent 
"incomplete line" handling (that I acked, bad me), but I didn't actually 
test.

Trivially tested thus:

	[torvalds@g5 git]$ git-stripspace <<EOF
	> a
	> 
	> EOF
	a

	[torvalds@g5 git]$ 

note the extra unnecessary newline..

		Linus

^ permalink raw reply

* Re: Solaris test t5500 race condition
From: Jason Riedy @ 2006-04-14 16:41 UTC (permalink / raw)
  To: Peter Eriksen; +Cc: git
In-Reply-To: <20060414115317.GA5191@bohr.gbar.dtu.dk>

And "Peter Eriksen" writes:
 - > -- >8 --
 - > [PATCH] t5500: test fix
 - 
 - With the patch it doesn't complain anymore.  There are many other 
 - problems with the tests on Solaris though.

I just ran next branch's tests on 5.8 with no problems.  Could 
you be a bit more specific?

Jason

^ permalink raw reply

* Re: Fix up diffcore-rename scoring
From: Geert Bosch @ 2006-04-14 17:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmzer4vmm.fsf@assigned-by-dhcp.cox.net>


On Apr 11, 2006, at 18:04, Junio C Hamano wrote:
>> Here's a possible way to do that first cut. Basically,
>> compute a short (256-bit) fingerprint for each file, such
>> that the Hamming distance between two fingerprints is a measure
>> for their similarity. I'll include a draft write up below.
>
> Thanks for starting this.
>
> There are a few things I need to talk about the way "similarity"
> is _used_ in the current algorithms.
>
> Rename/copy detection outputs "similarity" but I suspect what
> the algorithm wants is slightly different from what humans think
> of "similarity".  It is somewhere between "similarity" and
> "commonness".  When you are grading a 130-page report a student
> submitted, you would want to notice that last 30 pages are
> almost verbatim copy from somebody else's report.  The student
> in question added 100-page original contents so maybe this is
> not too bad, but if the report were a 30-page one, and the
> entier 30 pages were borrowed from somebody else's 130-page
> report, you would _really_ want to notice.

There just isn't enough information in a 256-bit fingerprint
to be able to determine if two strings have a long common
substring. Also, when the input gets longer, like a few MB,
or when the input has little information content (compresses
very well), statistical bias will reduce reliability.

Still, I used the similarity test on large tar archives, such
as complete GCC releases, and it does give reasonable
similarity estimates. Non-related inputs rarely have scores
above 5.

potomac%../gsimm - 
rd026c470aab28a1086403768a428358f218bba049d47e7d49f8589c2c0baca0c *.tar
55746560 gcc-2.95.1.tar 123 3.1
55797760 gcc-2.95.2.tar 112 11.8
55787520 gcc-2.95.3.tar 112 11.8
87490560 gcc-3.0.1.tar 112 11.8
88156160 gcc-3.0.2.tar 78 38.6
86630400 gcc-3.0.tar 80 37.0
132495360 gcc-3.1.tar 0 100.0

I'm mostly interested in the data storage aspects of git,
looking bottom-up at the blobs stored and deriving information
from that. My similarity estimator allows one to look at thousands
of large checked in files and quickly identify similar files.
For example, in the above case, you'd find it makes sense
to store gcc-3.1.tar as a difference from gcc-3.0.tar.
Doing an actual diff between these two archives takes a few
seconds, while the fingerprints can be compared in microseconds.

> While reorganizaing a program, a nontrivial amount of text is
> often removed from an existing file and moved to a newly created
> file.  Right now, the way similarity score is calculated has a
> heuristical cap to reject two files whose sizes are very
> different, but to detect and show this kind of file split, the
> sizes of files should matter less.
The way to do this is to split a file at content-determined
breakpoints: check the last n bits of a cyclic checksum over
a sliding window, and break if they match a magic number.
This would split the file in blocks with expected size of 2^n.
Then you'd store a fingerprint per chunk.
> [...]
> Another place we use "similarity" is to break a file that got
> modified too much.  This is done for two independent purposes.
This could be done directly using the given algorithm.

> [...] Usually rename/copy
> detection tries to find rename/copy into files that _disappear_
> from the result, but with the above sequence, B never
> disappears.  By looking at how dissimilar the preimage and
> postimage of B are, we tell the rename/copy detector that B,
> although it does not disappear, might have been renamed/copied
> from somewhere else.
This could also be cheaply determined by my similarity estimator.
Almost always, you'd have a high similarity score. When there is
a low score, you could verify with a more precise and expensive
algorithm to have a consistent decision on what is considered
a break.

There is a -v option that gives more verbose output, including
estimated and actual average distances from the origin for the
random walks. For random input they'll be very close, but for
input with a lot of repetition the actual average will be far
larger. The ratio can be used as a measure of reliability of
the fingerprint: ratio's closer to 1 are better.
> Also we can make commonness matter even more in the similarlity
> used to "break" a file than rename detector, because if we are
> going to break it, we will not have to worry about the issue of
> showing an annoying diff that removes 100 lines after copying a
> 130-line file.  This implies that the break algorithm needs to
> use two different kinds of similarity, one for breaking and then
> another for deciding how to show the broken pieces as a diff.
>
> Sorry if this write-up does not make much sense.  It ended up
> being a lot more incoherent than I hoped it to be.
Regular diff algorithms will always give the most precise result.
What my similarity estimator does is give a probability that
two files have a lot of common substrings. Say, you'd have a
git archive with 10,000 blobs of about 1 MB, and you'd want
to determine how to pack this. You clearly can't use diff
programs to solve this, but you can use the estimates.

> Anyway, sometime this week I'll find time to play with your code
> myself.
Thanks, I'm looking forward to your comments.

   -Geert

^ permalink raw reply

* Re: Default remote branch for local branch
From: Josef Weidendorfer @ 2006-04-14 18:26 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20060414161627.GA27689@pasky.or.cz>

On Friday 14 April 2006 18:16, you wrote:
> Dear diary, on Sun, Apr 02, 2006 at 06:17:29PM CEST, I got a letter
> where Josef Weidendorfer <Josef.Weidendorfer@gmx.de> said that...
> > > I would write the config like this:
> > > 
> > > [branch-upstream]
> > > master = linus
> > > ata-irq-pio = irq-pio
> > > ata-pata = pata-drivers
> > 
> > That is not working, as said above. But with above syntax extension,
> > with s/=/for/ it would be fine.
> 
> I'm sorry but I'm slow and I don't see it - why wouldn't this work?
> (Except that the key name is case insensitive, which isn't too big a
> deal IMHO.)

Hmm...
* IMHO "keys are case insensitive" is enough to not qualify for branch
names: currently, branch names are case sensitive, and with above syntax you
effectively change this rule (you can not distinguish upstreams for "master"
vs. "MASTER").
* a dot currently seems to be allowed in branch names. For config keys, the
dot separates subkeys.
* I thought it is a convention for config keys to be alphanum only,
eg. "/" isn't allowed, too (which is mandatory for branch names).
Unfortunately, I found nothing about allowed chars for config keys in the
documentation.

> I for one think that the 'for'-syntax is insane - it's unreadable (your
> primary query is by far most likely to be "what's the upstream when on
> branch X", not "what branches is this upstream for"), would convolute
> the configuration file syntax unnecessarily and would possibly also
> complicate the git-repo-config interface.

As far as I remember, the "... for ..." syntax was suggested by Linus for the
proxy.command config a long time ago. The original proposal there was to
use an URL as key part (as far as I can remember).

That said,

> Pavel's syntax is much nicer. 

... I agree with you here.

My suggestion would be to allow an optional syntax in the config file which is mapped
by git-repo-config to the normalized "... for ..."-scheme.
Eg. it should not be mandatory to specify "for ..." after the value of a key.
So instead of

  branch.upstream = linus for master

you should be able to say

  [branch]
  upstream for master = linus

Josef

^ permalink raw reply

* [PATCH] cg-admin-rewritehist: Seed the commit map with the parents specified with -r.
From: Johannes Sixt @ 2006-04-14 18:54 UTC (permalink / raw)
  To: git

When the first commit is manufactured, its parents are looked up in the
commit map. However, without this patch the map is always empty at that time.
If the entire history is rewritten, this is no problem because the first
commit does not have any parents anyway. However, if -r is used to constrain
rewriting to only part of the history, this first commit is manufactured
incorrectly without parents because 'cat' fails.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>

---

 cg-admin-rewritehist |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

ec09427d1fb4097c15fd6df4f07049a536bb7d2c
diff --git a/cg-admin-rewritehist b/cg-admin-rewritehist
index 9c49d80..b72c641 100755
--- a/cg-admin-rewritehist
+++ b/cg-admin-rewritehist
@@ -138,6 +138,7 @@ _git_requires_root=1

 tempdir=.git-rewrite
 startrev=
+startrevparents=
 filter_env=
 filter_tree=
 filter_index=
@@ -149,6 +150,7 @@ while optparse; do
 		tempdir="$OPTARG"
 	elif optparse -r=; then
 		startrev="^$OPTARG $OPTARG $startrev"
+		startrevparents="$OPTARG $startrevparents"
 	elif optparse --env-filter=; then
 		filter_env="$OPTARG"
 	elif optparse --tree-filter=; then
@@ -182,6 +184,11 @@ ret=0

 mkdir ../map # map old->new commit ids for rewriting parents
+
+# seed with identity mappings for the parents where we start off
+for commit in $startrevparents; do
+	echo $commit > ../map/$commit
+done

 git-rev-list --topo-order HEAD $startrev | tac >../revs
 commits=$(cat ../revs | wc -l)
-- 
1.3.0.rc2

^ permalink raw reply related

* Re: git-stripspace breakage
From: Junio C Hamano @ 2006-04-14 19:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0604140936520.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> Junio,
>  the current git-stripspace leaves extra newlines at the end, causing ugly 
> commit logs in "git log". I assume/suspect that it's the recent 
> "incomplete line" handling (that I acked, bad me), but I didn't actually 
> test.

Bad me too indeed.  I noticed it last night after writing
"What's in" message.  Will fix shortly.

Thanks.

^ permalink raw reply

* Re: Recent unresolved issues
From: sean @ 2006-04-14 19:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v64lcqz9j.fsf@assigned-by-dhcp.cox.net>

On Fri, 14 Apr 2006 02:31:36 -0700
Junio C Hamano <junkio@cox.net> wrote:

> * Message-ID: <Pine.LNX.4.64.0604111725590.14565@g5.osdl.org>
>   Colored diff? (Linus Torvalds)
> 
>   I am not opposed to it, but I'd like to do that internally if
>   we go this route.  Needs to wait "option parsing".  Also
>   Message-ID: <3536.10.10.10.24.1114117965.squirrel@linux1> is
>   slightly related to this.

Moving it internal sounds like a good idea.  Would you be open to
including the GIT_DIFF_PAGER option now anyway?   It has utility
beyond just color diffs.

Sean

^ permalink raw reply

* Re: Recent unresolved issues
From: Petr Baudis @ 2006-04-14 19:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v64lcqz9j.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Fri, Apr 14, 2006 at 11:31:36AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> * Message-ID: <Pine.LNX.4.64.0604111725590.14565@g5.osdl.org>
>   Colored diff? (Linus Torvalds)
> 
>   I am not opposed to it, but I'd like to do that internally if
>   we go this route.  Needs to wait "option parsing".  Also
>   Message-ID: <3536.10.10.10.24.1114117965.squirrel@linux1> is
>   slightly related to this.

It might be worthwhile to make Git and Cogito compatible if you offer
colors customization. Cogito lets the user customize the colors through
the $CG_COLORS variable (see cg-diff(1)).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* git-svn and Author files question
From: Seth Falcon @ 2006-04-14 20:34 UTC (permalink / raw)
  To: git

Hi all,

I've been using git to manually track changes to a project that uses
svn as its primary SCM.

git-svn looks like it can help me streamline my workflow, but I'm
getting stuck with the following:

    mkdir foo
    cd foo
    git-svn init $URL  <--- the svn URL
    git-svn fetch
    Author: dfcimm3 not defined in  file

:-(

Can someone point me to the file and the place that describes what I
should put in it?  There are many committers to the svn project.  I'm
hoping that I will not have to enumerate all of their names in some
file.

I'm using git version 1.3.0.rc1.g40e9, and BTW, enjoying it very much.

Thanks,

+ seth

^ permalink raw reply

* git log is a bit antisocial
From: Nicolas Pitre @ 2006-04-14 20:50 UTC (permalink / raw)
  To: git


$  git log -h
fatal: unrecognized argument: -h
$ git log --help
fatal: unrecognized argument: --help

Maybe the usage string could be printed in those cases?


Nicolas

^ permalink raw reply

* Re: git log is a bit antisocial
From: Junio C Hamano @ 2006-04-14 20:56 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604141647360.2215@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> $  git log -h
> fatal: unrecognized argument: -h
> $ git log --help
> fatal: unrecognized argument: --help
>
> Maybe the usage string could be printed in those cases?

Perhaps.  Alternatively, "git help log", perhaps.

^ permalink raw reply

* Re: git log is a bit antisocial
From: Nicolas Pitre @ 2006-04-14 21:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vlku7q3k7.fsf@assigned-by-dhcp.cox.net>

On Fri, 14 Apr 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > $  git log -h
> > fatal: unrecognized argument: -h
> > $ git log --help
> > fatal: unrecognized argument: --help
> >
> > Maybe the usage string could be printed in those cases?
> 
> Perhaps.  Alternatively, "git help log", perhaps.

What about git-log then?


Nicolas

^ permalink raw reply

* Re: git log is a bit antisocial
From: Junio C Hamano @ 2006-04-14 21:28 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604141719290.2215@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> On Fri, 14 Apr 2006, Junio C Hamano wrote:
>
>> Nicolas Pitre <nico@cam.org> writes:
>> 
>> > $  git log -h
>> > fatal: unrecognized argument: -h
>> > $ git log --help
>> > fatal: unrecognized argument: --help
>> >
>> > Maybe the usage string could be printed in those cases?
>> 
>> Perhaps.  Alternatively, "git help log", perhaps.
>
> What about git-log then?

What about it?

Asking for help on log could be spelled as "git log --help" with
a patch like the attached, but I am not sure that is worth it...

-- >8 --
diff --git a/git.c b/git.c
index 78ed403..7fdacdd 100644
--- a/git.c
+++ b/git.c
@@ -497,6 +497,16 @@ int main(int argc, const char **argv, ch
 	}
 	argv[0] = cmd;
 
+	/* It could be git blah --help or git boo -h, but be
+	 * careful; most commands have their own '-h' and '--help'.
+	 */
+	if (argc == 2 &&
+	    (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help"))) {
+		argv[0] = "help";
+		argv[1] = cmd;
+		exit(cmd_help(1, argv, envp));
+	}
+
 	/*
 	 * We search for git commands in the following order:
 	 *  - git_exec_path()

^ permalink raw reply related

* Re: git log is a bit antisocial
From: Sébastien Pierre @ 2006-04-14 21:44 UTC (permalink / raw)
  To: git
In-Reply-To: <7vhd4vq23h.fsf@assigned-by-dhcp.cox.net>

Le vendredi 14 avril 2006 à 14:28 -0700, Junio C Hamano a écrit :

> What about it?
> 
> Asking for help on log could be spelled as "git log --help" with
> a patch like the attached, but I am not sure that is worth it...

I would say that it is very useful to newbies, or simply not to
frustrate users trying to get help. It is really worth it, at least for
me.

 -- Sébastien

^ permalink raw reply

* Re: git log is a bit antisocial
From: Junio C Hamano @ 2006-04-14 22:06 UTC (permalink / raw)
  To: Sébastien Pierre; +Cc: git
In-Reply-To: <1145051072.27704.1.camel@localhost.localdomain>

Sébastien Pierre <sebastien@xprima.com> writes:

> Le vendredi 14 avril 2006 à 14:28 -0700, Junio C Hamano a écrit :
>
>> What about it?
>> 
>> Asking for help on log could be spelled as "git log --help" with
>> a patch like the attached, but I am not sure that is worth it...
>
> I would say that it is very useful to newbies, or simply not to
> frustrate users trying to get help. It is really worth it, at least for
> me.

Have you read the patch, especially the comment in it?  With and
without the patch, this command would behave quite differently:

	$ git commit --help

^ permalink raw reply

* Re: git log is a bit antisocial
From: Sébastien Pierre @ 2006-04-14 22:15 UTC (permalink / raw)
  To: git
In-Reply-To: <7vu08volrp.fsf@assigned-by-dhcp.cox.net>

Le vendredi 14 avril 2006 à 15:06 -0700, Junio C Hamano a écrit :

> Have you read the patch, especially the comment in it?  With and
> without the patch, this command would behave quite differently:

I did not realize that at first (I thought this would be a fallback
method). 

Anyway, on git 1.2.3, here is something interesting:

>> git log -h
fatal: Not a git repository

>> git log --help
Usage: /home/sebastien/Local/bin/git-log [--max-count=<n>]
[<since>..<limit>] [--pretty=<format>] [git-rev-list options]

Which is confusing, so having a consistent behaviour for "git help cmd",
"git cmd help", "git cmd -h" and "git cmd --help" would be nice.

For instance, Darcs works just like that, which makes it easy for
newbies to find there ways through.

 -- Sébastien

^ permalink raw reply

* [PATCH] rev-list --bisect: limit list before bisecting.
From: Junio C Hamano @ 2006-04-14 22:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

I noticed bisect does not work well without both good and bad.
Running this script in git.git repository would give you quite
different results:

	#!/bin/sh
        initial=e83c5163316f89bfbde7d9ab23ca2e25604af290

        mid0=`git rev-list --bisect ^$initial --all`

        git rev-list $mid0 | wc -l
        git rev-list ^$mid0 --all | wc -l

        mid1=`git rev-list --bisect --all`

        git rev-list $mid1 | wc -l
        git rev-list ^$mid1 --all | wc -l

The $initial commit is the very first commit you made.  The
first midpoint bisects things evenly as designed, but the latter
does not.

The reason I got interested in this was because I was wondering
if something like the following would help people converting a
huge repository from foreign SCM, or preparing a repository to
be fetched over plain dumb HTTP only:

        #!/bin/sh

        N=4
        P=.git/objects/pack
        bottom=

        while test 0 \< $N
        do
                N=$((N-1))
                if test -z "$bottom"
                then
                        newbottom=`git rev-list --bisect --all`
                else
                        newbottom=`git rev-list --bisect ^$bottom --all`
                fi
                if test -z "$bottom"
                then
                        rev_list="$newbottom"
                elif test 0 = $N
                then
                        rev_list="^$bottom --all"
                else
                        rev_list="^$bottom $newbottom"
                fi
                p=$(git rev-list --unpacked --objects $rev_list |
                    git pack-objects $P/pack)
                git show-index <$P/pack-$p.idx | wc -l
                bottom=$newbottom
        done

The idea is to pack older half of the history to one pack, then
older half of the remaining history to another, to continue a
few times, using finer granularity as we get closer to the tip.

This may not matter, since for a truly huge history, running
bisect number of times could be quite time consuming, and we
might be better off running "git rev-list --all" once into a
temporary file, and manually pick cut-off points from the
resulting list of commits.  After all we are talking about
"approximately half" for such an usage, and older history does
not matter much.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 rev-list.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/rev-list.c b/rev-list.c
index 963707a..cb67b39 100644
--- a/rev-list.c
+++ b/rev-list.c
@@ -371,6 +371,8 @@ int main(int argc, const char **argv)

 	save_commit_buffer = verbose_header;
 	track_object_refs = 0;
+	if (bisect_list)
+		revs.limited = 1;

 	prepare_revision_walk(&revs);
 	if (revs.tree_objects)

^ permalink raw reply related

* Re: git log is a bit antisocial
From: Junio C Hamano @ 2006-04-14 22:45 UTC (permalink / raw)
  To: Sébastien Pierre; +Cc: git
In-Reply-To: <1145052905.27704.8.camel@localhost.localdomain>

Sébastien Pierre <sebastien@xprima.com> writes:

> Anyway, on git 1.2.3, here is something interesting:
>
>>> git log -h
> fatal: Not a git repository
>
>>> git log --help
> Usage: /home/sebastien/Local/bin/git-log [--max-count=<n>]
> [<since>..<limit>] [--pretty=<format>] [git-rev-list options]

You are talking about old codebase in the maitenance branch,
which is an independent issue, but thanks for noticing anyway.

The attached patch would help with that.

> Which is confusing, so having a consistent behaviour for "git help cmd",
> "git cmd help", "git cmd -h" and "git cmd --help" would be nice.
>
> For instance, Darcs works just like that, which makes it easy for
> newbies to find there ways through.

Patches welcome, but a new development should be based on the
"master" branch, not the maintenance 1.2.X series.

-- >8 --
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 025ef2d..d15747f 100755
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -30,7 +30,7 @@ else
 fi
 
 case "$1" in
-	--h|--he|--hel|--help)
+	-h|--h|--he|--hel|--help)
 	echo "$LONG_USAGE"
 	exit
 esac

^ permalink raw reply related

* Re: Recent unresolved issues: shallow clone
From: Carl Worth @ 2006-04-14 22:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v64lcqz9j.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]

On Fri, 14 Apr 2006 02:31:36 -0700, Junio C Hamano wrote:
>   Shallow clones (Carl Worth).
> 
>   The experiment last round did not work out very well, but as
>   existing repositories get bigger, and more projects being
>   migrated from foreign SCM systems, this would become a
>   must-have from would-be-nice-to-have.
> 
>   I am beginning to think using "graft" to cauterize history
>   for this, while it technically would work, would not be so
>   helpful to users, so the design needs to be worked out again.

As context, here is some of what you mentioned in IRC:

>>	Suppose you have this:
>>
>>	A---B---C
>>	 \       \ 
>>	  D---E---F---G
>>	 
>>	and you made a shallow clone of C (because that is where the
>>	upstream master was when you made that clone).  Then the
>>	upstream updated the master branch tip to G.
>>
>>	The next update from upstream to your shallow clone would break.
>>	The upstream says: I have G at master.
>>	You say: I want G then.  By the way, I have C.
>>
>>	What it means to tell the other end "I have X" is to promise
>>	that you have X and _everything_ behind it.  So the upstream
>>	would send objects necessary to complete D, E, F and G for
>>	"somebody who already have A and B".  As a consequence, you
>>	would not see A nor B.
>>
>>	Even if the only thing you are interested in is to be in sync
>>	with the tip of the upstream, you can end up with an
>>	incomplete tree for G, if some of the blobs or trees contained
>>	in G already exist in A or B.  They are not sent -- because
>>	you told the upstream that you have everything necessary to
>>	get to C.

So that's an argument against using a cauterizing graft for the
shallow clone of C. It definitely confuses the existing protocol to
say "I have C" if I have only a cauterized C, (its tree only, but none
of the commits that should be backing C).

I also read over some of your discussion of extending the protocol
with a new "shallow" extension.

I'm wondering if the shallow clone support couldn't be achieved
through a simpler tweak to the protocol semantics, (and no change to
protocol syntax), that would avoid the problem above. Specifically,
for shallow stuff, could we just do the same "want" and "have"
conversation with tree objects rather than commit objects?

So, in the scenario above, the original shallow clone of C would be:

	Want C->tree, have nothing.

and the later shallow update to G would be:

	Want G->tree, have C->tree

A final step of a shallow clone would then require creating a new
parent-less commit object so that there's something to point refs/head
at, (or maybe rather than being parentless, they could be chained
together with each update?).

I admit that this would result in a rather atypical kind of
repository, but it would contain plenty of valid trees and blobs, so
it should conceptually be fairly easy to promote such a thing to a
full repository.

But, even without any tool support for promotion, the ability to do
shallow clone and shallow updates would still provide a useful
capability [*].

-Carl

[*] For reference, what I'm looking for here is a way to justify
providing git support for jhbuild, which is a tool used by testers of
GNOME and other software to efficiently track the latest development
of an arbitrarily large number of packages. It's currently primarily a
CVS-based thing. Switching to git would be a huge win for the
incremental updates, but would currently cause quite a hit for the
first clone.

[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: Recent unresolved issues
From: Linus Torvalds @ 2006-04-14 23:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7v64lcqz9j.fsf@assigned-by-dhcp.cox.net>



On Fri, 14 Apr 2006, Junio C Hamano wrote:
> 
> * Message-ID: <Pine.LNX.4.64.0604121828370.14565@g5.osdl.org>
>   Common option parsing (Linus Torvalds)

Ok, here's a first cut at starting this.

This basically does a few things that are sadly somewhat interdependent, 
and nontrivial to split out

 - get rid of "struct log_tree_opt"

   The fields in "log_tree_opt" are moved into "struct rev_info", and all 
   users of log_tree_opt are changed to use the rev_info struct instead.

 - add the parsing for the log_tree_opt arguments to "setup_revision()"

 - make setup_revision set a flag (revs->diff) if the diff-related 
   arguments were used. This allows "git log" to decide whether it wants 
   to show diffs or not.

 - make setup_revision() also initialize the diffopt part of rev_info 
   (which we had from before, but we just didn't initialize it)

 - make setup_revision() do all the "finishing touches" on it all (it will 
   do the proper flag combination logic, and call "diff_setup_done()")

Now, that was the easy and straightforward part.

The slightly more involved part is that some of the programs that want to 
use the new-and-improved rev_info parsing don't actually want _commits_, 
they may want tree'ish arguments instead. That meant that I had to change 
setup_revision() to parse the arguments not into the "revs->commits" list, 
but into the "revs->pending_objects" list.

Then, when we do "prepare_revision_walk()", we walk that list, and create 
the sorted commit list from there. 

This actually cleaned some stuff up, but it's the less obvious part of the 
patch, and re-organized the "revision.c" logic somewhat. It actually paves 
the way for splitting argument parsing _entirely_ out of "revision.c", 
since now the argument parsing really is totally independent of the commit 
walking: that didn't use to be true, since there was lots of overlap with 
get_commit_reference() handling etc, now the _only_ overlap is the shared 
(and trivial) "add_pending_object()" thing.

However, I didn't do that file split, just because I wanted the diff 
itself to be smaller, and show the actual changes more clearly. If this 
gets accepted, I'll do further cleanups then - that includes the file 
split, but also using the new infrastructure to do a nicer "git diff" etc.

Even in this form, it actually ends up removing more lines than it adds.

It's nice to note how simple and straightforward this makes the built-in 
"git log" command, even though it continues to support all the diff flags 
too. It doesn't get much simpler that this.

I think this is worth merging soonish, because it does allow for future 
cleanup and even more sharing of code. However, it obviously touches 
"revision.c", which is subtle. I've tested that it passes all the tests we 
have, and it passes my "looks sane" detector, but somebody else should 
also give it a good look-over.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

 diff-tree.c |   91 ++++++++++++++++-------------------
 git.c       |   68 ++------------------------
 log-tree.c  |   60 ++---------------------
 log-tree.h  |   22 ++------
 revision.c  |  155 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 revision.h  |   18 +++++++
 6 files changed, 202 insertions(+), 212 deletions(-)

diff --git a/diff-tree.c b/diff-tree.c
index 2b79dd0..54157e4 100644
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -3,7 +3,7 @@ #include "diff.h"
 #include "commit.h"
 #include "log-tree.h"
 
-static struct log_tree_opt log_tree_opt;
+static struct rev_info log_tree_opt;
 
 static int diff_tree_commit_sha1(const unsigned char *sha1)
 {
@@ -62,66 +62,55 @@ int main(int argc, const char **argv)
 {
 	int nr_sha1;
 	char line[1000];
-	unsigned char sha1[2][20];
-	const char *prefix = setup_git_directory();
-	static struct log_tree_opt *opt = &log_tree_opt;
+	struct object *tree1, *tree2;
+	static struct rev_info *opt = &log_tree_opt;
+	struct object_list *list;
 	int read_stdin = 0;
 
 	git_config(git_diff_config);
 	nr_sha1 = 0;
-	init_log_tree_opt(opt);
+	argc = setup_revisions(argc, argv, opt, NULL);
 
-	for (;;) {
-		int opt_cnt;
-		const char *arg;
+	while (--argc > 0) {
+		const char *arg = *++argv;
 
-		argv++;
-		argc--;
-		arg = *argv;
-		if (!arg)
-			break;
-
-		if (*arg != '-') {
-			if (nr_sha1 < 2 && !get_sha1(arg, sha1[nr_sha1])) {
-				nr_sha1++;
-				continue;
-			}
-			break;
-		}
-
-		opt_cnt = log_tree_opt_parse(opt, argv, argc);
-		if (opt_cnt < 0)
-			usage(diff_tree_usage);
-		else if (opt_cnt) {
-			argv += opt_cnt - 1;
-			argc -= opt_cnt - 1;
-			continue;
-		}
-
-		if (!strcmp(arg, "--")) {
-			argv++;
-			argc--;
-			break;
-		}
 		if (!strcmp(arg, "--stdin")) {
 			read_stdin = 1;
 			continue;
 		}
 		usage(diff_tree_usage);
 	}
-
-	if (opt->combine_merges)
-		opt->ignore_merges = 0;
-
-	/* We can only do dense combined merges with diff output */
-	if (opt->dense_combined_merges)
-		opt->diffopt.output_format = DIFF_FORMAT_PATCH;
-
-	if (opt->diffopt.output_format == DIFF_FORMAT_PATCH)
-		opt->diffopt.recursive = 1;
 
-	diff_tree_setup_paths(get_pathspec(prefix, argv), opt);
-	diff_setup_done(&opt->diffopt);
+	/*
+	 * NOTE! "setup_revisions()" will have inserted the revisions
+	 * it parsed in reverse order. So if you do
+	 *
+	 *	git-diff-tree a b
+	 *
+	 * the commit list will be "b" -> "a" -> NULL, so we reverse
+	 * the order of the objects if the first one is not marked
+	 * UNINTERESTING.
+	 */
+	nr_sha1 = 0;
+	list = opt->pending_objects;
+	if (list) {
+		nr_sha1++;
+		tree1 = list->item;
+		list = list->next;
+		if (list) {
+			nr_sha1++;
+			tree2 = tree1;
+			tree1 = list->item;
+			if (list->next)
+				usage(diff_tree_usage);
+			/* Switch them around if the second one was uninteresting.. */
+			if (tree2->flags & UNINTERESTING) {
+				struct object *tmp = tree2;
+				tree2 = tree1;
+				tree1 = tmp;
+			}
+		}
+	}
 
 	switch (nr_sha1) {
 	case 0:
@@ -129,10 +118,12 @@ int main(int argc, const char **argv)
 			usage(diff_tree_usage);
 		break;
 	case 1:
-		diff_tree_commit_sha1(sha1[0]);
+		diff_tree_commit_sha1(tree1->sha1);
 		break;
 	case 2:
-		diff_tree_sha1(sha1[0], sha1[1], "", &opt->diffopt);
+		diff_tree_sha1(tree1->sha1,
+			       tree2->sha1,
+			       "", &opt->diffopt);
 		log_tree_diff_flush(opt);
 		break;
 	}
diff --git a/git.c b/git.c
index 78ed403..e8d1fcc 100644
--- a/git.c
+++ b/git.c
@@ -287,74 +287,18 @@ static int cmd_log(int argc, const char 
 	int abbrev = DEFAULT_ABBREV;
 	int abbrev_commit = 0;
 	const char *commit_prefix = "commit ";
-	struct log_tree_opt opt;
 	int shown = 0;
-	int do_diff = 0;
-	int full_diff = 0;
 
-	init_log_tree_opt(&opt);
 	argc = setup_revisions(argc, argv, &rev, "HEAD");
-	while (1 < argc) {
-		const char *arg = argv[1];
-		if (!strncmp(arg, "--pretty", 8)) {
-			commit_format = get_commit_format(arg + 8);
-			if (commit_format == CMIT_FMT_ONELINE)
-				commit_prefix = "";
-		}
-		else if (!strcmp(arg, "--no-abbrev")) {
-			abbrev = 0;
-		}
-		else if (!strcmp(arg, "--abbrev")) {
-			abbrev = DEFAULT_ABBREV;
-		}
-		else if (!strcmp(arg, "--abbrev-commit")) {
-			abbrev_commit = 1;
-		}
-		else if (!strncmp(arg, "--abbrev=", 9)) {
-			abbrev = strtoul(arg + 9, NULL, 10);
-			if (abbrev && abbrev < MINIMUM_ABBREV)
-				abbrev = MINIMUM_ABBREV;
-			else if (40 < abbrev)
-				abbrev = 40;
-		}
-		else if (!strcmp(arg, "--full-diff")) {
-			do_diff = 1;
-			full_diff = 1;
-		}
-		else {
-			int cnt = log_tree_opt_parse(&opt, argv+1, argc-1);
-			if (0 < cnt) {
-				do_diff = 1;
-				argv += cnt;
-				argc -= cnt;
-				continue;
-			}
-			die("unrecognized argument: %s", arg);
-		}
+	if (argc > 1)
+		die("unrecognized argument: %s", argv[1]);
 
-		argc--; argv++;
-	}
-
-	if (do_diff) {
-		opt.diffopt.abbrev = abbrev;
-		opt.verbose_header = 0;
-		opt.always_show_header = 0;
-		opt.no_commit_id = 1;
-		if (opt.combine_merges)
-			opt.ignore_merges = 0;
-		if (opt.dense_combined_merges)
-			opt.diffopt.output_format = DIFF_FORMAT_PATCH;
-		if (opt.diffopt.output_format == DIFF_FORMAT_PATCH)
-			opt.diffopt.recursive = 1;
-		if (!full_diff && rev.prune_data)
-			diff_tree_setup_paths(rev.prune_data, &opt.diffopt);
-		diff_setup_done(&opt.diffopt);
-	}
+	rev.no_commit_id = 1;
 
 	prepare_revision_walk(&rev);
 	setup_pager();
 	while ((commit = get_revision(&rev)) != NULL) {
-		if (shown && do_diff && commit_format != CMIT_FMT_ONELINE)
+		if (shown && rev.diff && commit_format != CMIT_FMT_ONELINE)
 			putchar('\n');
 		fputs(commit_prefix, stdout);
 		if (abbrev_commit && abbrev)
@@ -388,8 +332,8 @@ static int cmd_log(int argc, const char 
 		pretty_print_commit(commit_format, commit, ~0, buf,
 				    LOGSIZE, abbrev);
 		printf("%s\n", buf);
-		if (do_diff)
-			log_tree_commit(&opt, commit);
+		if (rev.diff)
+			log_tree_commit(&rev, commit);
 		shown = 1;
 		free(commit->buffer);
 		commit->buffer = NULL;
diff --git a/log-tree.c b/log-tree.c
index 3d40482..04a68e0 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -3,58 +3,8 @@ #include "diff.h"
 #include "commit.h"
 #include "log-tree.h"
 
-void init_log_tree_opt(struct log_tree_opt *opt)
+int log_tree_diff_flush(struct rev_info *opt)
 {
-	memset(opt, 0, sizeof *opt);
-	opt->ignore_merges = 1;
-	opt->header_prefix = "";
-	opt->commit_format = CMIT_FMT_RAW;
-	diff_setup(&opt->diffopt);
-}
-
-int log_tree_opt_parse(struct log_tree_opt *opt, const char **av, int ac)
-{
-	const char *arg;
-	int cnt = diff_opt_parse(&opt->diffopt, av, ac);
-	if (0 < cnt)
-		return cnt;
-	arg = *av;
-	if (!strcmp(arg, "-r"))
-		opt->diffopt.recursive = 1;
-	else if (!strcmp(arg, "-t")) {
-		opt->diffopt.recursive = 1;
-		opt->diffopt.tree_in_recursive = 1;
-	}
-	else if (!strcmp(arg, "-m"))
-		opt->ignore_merges = 0;
-	else if (!strcmp(arg, "-c"))
-		opt->combine_merges = 1;
-	else if (!strcmp(arg, "--cc")) {
-		opt->dense_combined_merges = 1;
-		opt->combine_merges = 1;
-	}
-	else if (!strcmp(arg, "-v")) {
-		opt->verbose_header = 1;
-		opt->header_prefix = "diff-tree ";
-	}
-	else if (!strncmp(arg, "--pretty", 8)) {
-		opt->verbose_header = 1;
-		opt->header_prefix = "diff-tree ";
-		opt->commit_format = get_commit_format(arg+8);
-	}
-	else if (!strcmp(arg, "--root"))
-		opt->show_root_diff = 1;
-	else if (!strcmp(arg, "--no-commit-id"))
-		opt->no_commit_id = 1;
-	else if (!strcmp(arg, "--always"))
-		opt->always_show_header = 1;
-	else
-		return 0;
-	return 1;
-}
-
-int log_tree_diff_flush(struct log_tree_opt *opt)
-{
 	diffcore_std(&opt->diffopt);
 	if (diff_queue_is_empty()) {
 		int saved_fmt = opt->diffopt.output_format;
@@ -73,7 +23,7 @@ int log_tree_diff_flush(struct log_tree_
 	return 1;
 }
 
-static int diff_root_tree(struct log_tree_opt *opt,
+static int diff_root_tree(struct rev_info *opt,
 			  const unsigned char *new, const char *base)
 {
 	int retval;
@@ -93,7 +43,7 @@ static int diff_root_tree(struct log_tre
 	return retval;
 }
 
-static const char *generate_header(struct log_tree_opt *opt,
+static const char *generate_header(struct rev_info *opt,
 				   const unsigned char *commit_sha1,
 				   const unsigned char *parent_sha1,
 				   const struct commit *commit)
@@ -129,7 +79,7 @@ static const char *generate_header(struc
 	return this_header;
 }
 
-static int do_diff_combined(struct log_tree_opt *opt, struct commit *commit)
+static int do_diff_combined(struct rev_info *opt, struct commit *commit)
 {
 	unsigned const char *sha1 = commit->object.sha1;
 
@@ -142,7 +92,7 @@ static int do_diff_combined(struct log_t
 	return 0;
 }
 
-int log_tree_commit(struct log_tree_opt *opt, struct commit *commit)
+int log_tree_commit(struct rev_info *opt, struct commit *commit)
 {
 	struct commit_list *parents;
 	unsigned const char *sha1 = commit->object.sha1;
diff --git a/log-tree.h b/log-tree.h
index da166c6..91a909b 100644
--- a/log-tree.h
+++ b/log-tree.h
@@ -1,23 +1,11 @@
 #ifndef LOG_TREE_H
 #define LOG_TREE_H
 
-struct log_tree_opt {
-	struct diff_options diffopt;
-	int show_root_diff;
-	int no_commit_id;
-	int verbose_header;
-	int ignore_merges;
-	int combine_merges;
-	int dense_combined_merges;
-	int always_show_header;
-	const char *header_prefix;
-	const char *header;
-	enum cmit_fmt commit_format;
-};
+#include "revision.h"
 
-void init_log_tree_opt(struct log_tree_opt *);
-int log_tree_diff_flush(struct log_tree_opt *);
-int log_tree_commit(struct log_tree_opt *, struct commit *);
-int log_tree_opt_parse(struct log_tree_opt *, const char **, int);
+void init_log_tree_opt(struct rev_info *);
+int log_tree_diff_flush(struct rev_info *);
+int log_tree_commit(struct rev_info *, struct commit *);
+int log_tree_opt_parse(struct rev_info *, const char **, int);
 
 #endif
diff --git a/revision.c b/revision.c
index 0505f3f..99077af 100644
--- a/revision.c
+++ b/revision.c
@@ -116,21 +116,27 @@ static void add_pending_object(struct re
 	add_object(obj, &revs->pending_objects, NULL, name);
 }
 
-static struct commit *get_commit_reference(struct rev_info *revs, const char *name, const unsigned char *sha1, unsigned int flags)
+static struct object *get_reference(struct rev_info *revs, const char *name, const unsigned char *sha1, unsigned int flags)
 {
 	struct object *object;
 
 	object = parse_object(sha1);
 	if (!object)
 		die("bad object %s", name);
+	object->flags |= flags;
+	return object;
+}
+
+static struct commit *handle_commit(struct rev_info *revs, struct object *object, const char *name)
+{
+	unsigned long flags = object->flags;
 
 	/*
 	 * Tag object? Look what it points to..
 	 */
 	while (object->type == tag_type) {
 		struct tag *tag = (struct tag *) object;
-		object->flags |= flags;
-		if (revs->tag_objects && !(object->flags & UNINTERESTING))
+		if (revs->tag_objects && !(flags & UNINTERESTING))
 			add_pending_object(revs, object, tag->tag);
 		object = parse_object(tag->tagged->sha1);
 		if (!object)
@@ -143,7 +149,6 @@ static struct commit *get_commit_referen
 	 */
 	if (object->type == commit_type) {
 		struct commit *commit = (struct commit *)object;
-		object->flags |= flags;
 		if (parse_commit(commit) < 0)
 			die("unable to parse commit %s", name);
 		if (flags & UNINTERESTING) {
@@ -449,14 +454,6 @@ static void limit_list(struct rev_info *
 		}
 	}
 	revs->commits = newlist;
-}
-
-static void add_one_commit(struct commit *commit, struct rev_info *revs)
-{
-	if (!commit || (commit->object.flags & SEEN))
-		return;
-	commit->object.flags |= SEEN;
-	commit_list_insert(commit, &revs->commits);
 }
 
 static int all_flags;
@@ -464,8 +461,8 @@ static struct rev_info *all_revs;
 
 static int handle_one_ref(const char *path, const unsigned char *sha1)
 {
-	struct commit *commit = get_commit_reference(all_revs, path, sha1, all_flags);
-	add_one_commit(commit, all_revs);
+	struct object *object = get_reference(all_revs, path, sha1, all_flags);
+	add_pending_object(all_revs, object, "");
 	return 0;
 }
 
@@ -494,6 +491,11 @@ void init_revisions(struct rev_info *rev
 
 	revs->topo_setter = topo_sort_default_setter;
 	revs->topo_getter = topo_sort_default_getter;
+
+	revs->header_prefix = "";
+	revs->commit_format = CMIT_FMT_RAW;
+
+	diff_setup(&revs->diffopt);
 }
 
 /*
@@ -526,13 +528,14 @@ int setup_revisions(int argc, const char
 
 	flags = 0;
 	for (i = 1; i < argc; i++) {
-		struct commit *commit;
+		struct object *object;
 		const char *arg = argv[i];
 		unsigned char sha1[20];
 		char *dotdot;
 		int local_flags;
 
 		if (*arg == '-') {
+			int opts;
 			if (!strncmp(arg, "--max-count=", 12)) {
 				revs->max_count = atoi(arg + 12);
 				continue;
@@ -638,6 +641,78 @@ int setup_revisions(int argc, const char
 			}
 			if (!strcmp(arg, "--unpacked")) {
 				revs->unpacked = 1;
+				continue;
+			}
+			if (!strcmp(arg, "-r")) {
+				revs->diff = 1;
+				revs->diffopt.recursive = 1;
+				continue;
+			}
+			if (!strcmp(arg, "-t")) {
+				revs->diff = 1;
+				revs->diffopt.recursive = 1;
+				revs->diffopt.tree_in_recursive = 1;
+				continue;
+			}
+			if (!strcmp(arg, "-m")) {
+				revs->ignore_merges = 0;
+				continue;
+			}
+			if (!strcmp(arg, "-c")) {
+				revs->diff = 1;
+				revs->combine_merges = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--cc")) {
+				revs->diff = 1;
+				revs->dense_combined_merges = 1;
+				revs->combine_merges = 1;
+				continue;
+			}
+			if (!strcmp(arg, "-v")) {
+				revs->verbose_header = 1;
+				revs->header_prefix = "diff-tree ";
+				continue;
+			}
+			if (!strncmp(arg, "--pretty", 8)) {
+				revs->verbose_header = 1;
+				revs->header_prefix = "diff-tree ";
+				revs->commit_format = get_commit_format(arg+8);
+				continue;
+			}
+			if (!strcmp(arg, "--root")) {
+				revs->show_root_diff = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--no-commit-id")) {
+				revs->no_commit_id = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--always")) {
+				revs->always_show_header = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--no-abbrev")) {
+				revs->abbrev = 0;
+				continue;
+			}
+			if (!strcmp(arg, "--abbrev")) {
+				revs->abbrev = DEFAULT_ABBREV;
+				continue;
+			}
+			if (!strcmp(arg, "--abbrev-commit")) {
+				revs->abbrev_commit = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--full-diff")) {
+				revs->diff = 1;
+				revs->full_diff = 1;
+				continue;
+			}
+			opts = diff_opt_parse(&revs->diffopt, argv+i, argc-i);
+			if (opts > 0) {
+				revs->diff = 1;
+				i += opts - 1;
 				continue;
 			}
 			*unrecognized++ = arg;
@@ -656,15 +731,15 @@ int setup_revisions(int argc, const char
 				this = "HEAD";
 			if (!get_sha1(this, from_sha1) &&
 			    !get_sha1(next, sha1)) {
-				struct commit *exclude;
-				struct commit *include;
+				struct object *exclude;
+				struct object *include;
 
-				exclude = get_commit_reference(revs, this, from_sha1, flags ^ UNINTERESTING);
-				include = get_commit_reference(revs, next, sha1, flags);
+				exclude = get_reference(revs, this, from_sha1, flags ^ UNINTERESTING);
+				include = get_reference(revs, next, sha1, flags);
 				if (!exclude || !include)
 					die("Invalid revision range %s..%s", arg, next);
-				add_one_commit(exclude, revs);
-				add_one_commit(include, revs);
+				add_pending_object(revs, exclude, this);
+				add_pending_object(revs, include, next);
 				continue;
 			}
 			*dotdot = '.';
@@ -689,16 +764,16 @@ int setup_revisions(int argc, const char
 			revs->prune_data = get_pathspec(revs->prefix, argv + i);
 			break;
 		}
-		commit = get_commit_reference(revs, arg, sha1, flags ^ local_flags);
-		add_one_commit(commit, revs);
+		object = get_reference(revs, arg, sha1, flags ^ local_flags);
+		add_pending_object(revs, object, arg);
 	}
-	if (def && !revs->commits) {
+	if (def && !revs->pending_objects) {
 		unsigned char sha1[20];
-		struct commit *commit;
+		struct object *object;
 		if (get_sha1(def, sha1) < 0)
 			die("bad default revision '%s'", def);
-		commit = get_commit_reference(revs, def, sha1, 0);
-		add_one_commit(commit, revs);
+		object = get_reference(revs, def, sha1, 0);
+		add_pending_object(revs, object, def);
 	}
 
 	if (revs->topo_order || revs->unpacked)
@@ -708,13 +783,37 @@ int setup_revisions(int argc, const char
 		diff_tree_setup_paths(revs->prune_data, &revs->diffopt);
 		revs->prune_fn = try_to_simplify_commit;
 	}
+	if (revs->combine_merges) {
+		revs->ignore_merges = 0;
+		if (revs->dense_combined_merges)
+			revs->diffopt.output_format = DIFF_FORMAT_PATCH;
+	}
+	if (revs->diffopt.output_format == DIFF_FORMAT_PATCH)
+		revs->diffopt.recursive = 1;
+	if (!revs->full_diff && revs->prune_data)
+		diff_tree_setup_paths(revs->prune_data, &revs->diffopt);
+	diff_setup_done(&revs->diffopt);
 
 	return left;
 }
 
 void prepare_revision_walk(struct rev_info *revs)
 {
-	sort_by_date(&revs->commits);
+	struct object_list *list;
+
+	list = revs->pending_objects;
+	revs->pending_objects = NULL;
+	while (list) {
+		struct commit *commit = handle_commit(revs, list->item, list->name);
+		if (commit) {
+			if (!(commit->object.flags & SEEN)) {
+				commit->object.flags |= SEEN;
+				insert_by_date(commit, &revs->commits);
+			}
+		}
+		list = list->next;
+	}
+
 	if (revs->limited)
 		limit_list(revs);
 	if (revs->topo_order)
diff --git a/revision.h b/revision.h
index 8970b57..9a45986 100644
--- a/revision.h
+++ b/revision.h
@@ -38,6 +38,24 @@ struct rev_info {
 			boundary:1,
 			parents:1;
 
+	/* Diff flags */
+	unsigned int	diff:1,
+			full_diff:1,
+			show_root_diff:1,
+			no_commit_id:1,
+			verbose_header:1,
+			ignore_merges:1,
+			combine_merges:1,
+			dense_combined_merges:1,
+			always_show_header:1;
+
+	/* Format info */
+	unsigned int	abbrev_commit:1;
+	unsigned int	abbrev;
+	enum cmit_fmt	commit_format;
+	const char	*header_prefix;
+	const char	*header;
+
 	/* special limits */
 	int max_count;
 	unsigned long max_age;

^ permalink raw reply related

* Re: Recent unresolved issues: shallow clone
From: Johannes Schindelin @ 2006-04-15  0:17 UTC (permalink / raw)
  To: Carl Worth; +Cc: Junio C Hamano, git
In-Reply-To: <87irpb7oma.wl%cworth@cworth.org>

Hi,

On Fri, 14 Apr 2006, Carl Worth wrote:

> I also read over some of your discussion of extending the protocol
> with a new "shallow" extension.
> 
> I'm wondering if the shallow clone support couldn't be achieved
> through a simpler tweak to the protocol semantics, (and no change to
> protocol syntax), that would avoid the problem above. Specifically,
> for shallow stuff, could we just do the same "want" and "have"
> conversation with tree objects rather than commit objects?

It would not help your problem at all. "have commit" really means that you 
have the commit and all its ancestors and their combined tree objects and 
the combined tree objects' blob objects.

If you have a cauterized history, you know that you are lacking some of 
them. But you don't know which ones.

Now, issuing a pull could mean to get an object which was present in an 
old revision, which you unfortunately do not have (because you have a cut 
off history). Boom.

I know, this is probably unlikely, but not at all *impossible*, so you 
have to take care of that case. And you need a protocol extension for 
that.

Hth,
Dscho

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox