Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Nguyen Thai Ngoc Duy @ 2007-11-30  0:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Jeff King, Johannes Schindelin, Jan Hudec, git
In-Reply-To: <7veje8twt2.fsf@gitster.siamese.dyndns.org>

On Nov 30, 2007 7:13 AM, Junio C Hamano <gitster@pobox.com> wrote:
>  - Post v1.5.4, start cooking gitexecdir=$(libexecdir)/git-core, aiming
>    for inclusion in v1.5.5, perhaps in Feb-Mar 2008 timeframe.  This
>    will also affect the sample RPM spec and resulting RPM binary
>    packages I will place on k.org, and I'll ask Gerrit to do the same on
>    Debian side.  The official binary packaging of individual distros are
>    not under my control, but if there is a handy list of people I can
>    send this notice to for other distros, that would help this process.

You can find Gentoo maintainers here:

http://sources.gentoo.org/viewcvs.py/gentoo-x86/dev-util/git/metadata.xml?rev=1.6&view=markup

-- 
Duy

^ permalink raw reply

* Re: Fix a pathological case in git detecting proper renames
From: Junio C Hamano @ 2007-11-30  0:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kumar Gala, Git Mailing List
In-Reply-To: <alpine.LFD.0.9999.0711291442300.8458@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> For the fuzzy rename detection, we generate the full score matrix, and 
> sort it by the score, up front. So all the scoring - and more importantly, 
> all the sorting - has actually been done before we actually start looking 
> at *any* renames at all, so we cannot easily do the same thing I did for 
> the exact renames, namely to take into account _earlier_ renames in the 
> scoring. Because those earlier renames have simply not been done when the 
> score is calculated.

I think I've mentioned this before, but another thing we may want to do
is to give similarity boost to a src-dst pair if other files in the same
src directory are found to be renamed to the same dst directory.  That
is, if you have the same contents in the preimage at A/init.S and B/init.S,
and a similar contents appear in C/init.S in the postimage, instead of
randomly picking A/init.S over B/init.S as the source, we can notice
that A/Makefile was moved to C/Makefile (but B/Makefile was sufficiently
different from A/Makefile in the preimage), and favor A/init.S over
B/init.S as the rename source of C/init.S.

About the code structure, I think the very early draft of rename
detector did not do the full matrix, but iterated over dst to see if
there is a good src for it, picked the best src that is above the
threshold, and went on to next dst, like this:

	for (dst in dst candidates) {
        	best_src = NULL;
                best_score = minimum_score;
                for (src in src candidates) {
                	score = similarity(dst, src);
                        if (score > best_score)
                            best_src = src;
		}
		if (best_src) {
			match dst with src;
		}
	}

This was restructured in the current "full matrix first" form before the
rename detection logic first hit your tree, and I do not think it was
shown in the field to perform worse than the full matrix version.

We could do the current full matrix that does not take basename
similarity nor what other renames were detected first, and then use that
matrix result in order to primarily define the order of dst candidates
to process and run the above loop.  At that point, similarity between
dst and src does not need to be recomputed fully (the matrix would
record it).  Instead, we can tweak it to take other renames that already
have been detected (this includes "this src has already been used", and
"somebody nearby moved to the same directory") and basename similarity
to affect which possible src candidate to choose for each dst.

^ permalink raw reply

* Re: Adding push configuration to .git/config
From: Jakub Narebski @ 2007-11-30  0:37 UTC (permalink / raw)
  To: git
In-Reply-To: <7v1wa8vfee.fsf@gitster.siamese.dyndns.org>

Junio C Hamano wrote:

> IIRC, there was a suggestion to enhance remote.$name configuration in
> this way instead, so that you can use different URL for fetching and
> pushing:
> 
>         [branch "foo"]
>         remote = "there"
>         merge = refs/heads/master
> 
>         [remote "there"]
>         url = git://git.there.xz/repo.git
>         push_url = git.there.xz:repo.git
>         push_url = git.there.xz:backup.git
>         fetch = refs/heads/*:refs/remotes/there/*
> 
> I further vaguely recall that the comments on the alternative were
> positive (it might have been you who responded, or somebody else, I do
> not remember).

If I remember correctly one of the suggestions was to allow for multiple
URLs, and for fetch use _first_ one that responds, for push use _all_
that are _possible_ to push to. Or at least support multiple url for
push; this way you would have to configure separate remote for fetch and
for push, but you would have to push only once to push to all repos.

But push_url, or pushURL seems like better idea, IMHO.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Jeff King @ 2007-11-30  0:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Johannes Schindelin, Nguyen Thai Ngoc Duy,
	Jan Hudec, git
In-Reply-To: <7veje8twt2.fsf@gitster.siamese.dyndns.org>

On Thu, Nov 29, 2007 at 04:13:29PM -0800, Junio C Hamano wrote:

>  - Post v1.5.5, start cooking the change that does not install hardlinks
>    for built-in commands, aiming for inclusion in v1.5.6, in May-Jun
>    2008 timeframe.

I am still against this step, for the reasons mentioned in the mails
leading up to the one you just quoted. I am fine with "does not install
hardlinks for builtin-commands on systems that don't support hardlinks"
(and of course all such hardlinks are in $(libexecdir)/git-core at this
point).

-Peff

^ permalink raw reply

* Re: problem with git detecting proper renames
From: Jakub Narebski @ 2007-11-30  0:21 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.9999.0711290934260.8458@woody.linux-foundation.org>

Linus Torvalds wrote:
> On Thu, 29 Nov 2007, Kumar Gala wrote:
>> 
>> I did some git-mv and got the following:
>> 
>> the problem is git seems confused about what file was associated with its
>> source.
> 
> Well, I wouldn't say "confused". It found multiple identical options for 
> the source, and picked the first one (where "first one" may not be obvious 
> to a human, it can depend on an internal hash order).

By the way, which git version do you use? IIRC we have improved rename
detection heuristics to take into account similarity of filenames when
contents is identical...

...ah, I see, it is git 1.5.3.4

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: Adding Git to Better SCM Initiative : Comparison
From: Jakub Narebski @ 2007-11-30  0:18 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Robin Rosenberg
In-Reply-To: <20071129200710.GA3314@steel.home>

On Thu, 29 Nov 2007, Alex Riesen wrote:
> Jakub Narebski, Thu, Nov 29, 2007 03:26:12 +0100:

>> +                <s id="git">
>> +                    Medium. There's Git User's Manual, manpages, some
>> +                    technical documentation and some howtos.  All
>> +                    documentation is also available online in HTML format;
>> +                    there is additional information (including beginnings
>> +                    of FAQ) on git wiki.
>> +                    Nevertheles one of complaints in surveys is insufficient
> 
> "Nevertheless" (two "s").
> 
> BTW, I wouldn't call the level of documentation "Medium" when compared
> to any commercial SCM. How can they earn more than "a little", when
> compared to any opensource program?

Source code is not [user level] documentation.

But perhaps it should be "Good" instead of "Medium", although I think
not "Excellent".
 
>> @@ -894,6 +938,14 @@ TODO:
>>                      to install the subversion perl bindings and a few modules
>>                      from CPAN.
>>                  </s>
>> +                <s id="git">
>> +                    TO DO. RPMs and deb packages for Linux. msysGit and
>> +                    Cygwin for Win32 - Git requires POSIX shell, Perl,
>> +                    and POSIX utilities for some commands (builtin).
> 
> I read this as: "Git requires all these programs for builtin
> commands". Which is a bit confusing. Just drop "(builtin)"?

What I meant to say that some Git commands are scripts in Perl or POSIX
shell, and that those Git commands requires POSIX utilities (which of
those utilities are needed is unfortunately not mentioned explicitely
in the INSTALL file); _but_ that there is ongoing effort to rewrite
matured commands in C (as built-ins).

But this is perhaps too long explanation to put it in this comparison
table.

>> +                    Autoconf to generate Makefile configuration; ready
>> +                    generic configuration for many OS. Compiling docs
>> +                    requires asciidoc and xmlto toolchain, but prebuild.
> 
> "prebuilt" (with "t"). Maybe remove ", but prebuilt" completely?

Gaaah, it should be "but you can get prebuilt docs".
 
>> @@ -1106,6 +1165,10 @@ TODO:
>>                      There exists some HTTP-functionality, but it is quite
>>                      limited.
>>                  </s>
>> +                <s id="git">
>> +                    Good.  Uses HTTPS (with WebDAV) or ssh for push,
>> +                    HTTP, FTP, ssh or custom protocol for fetch.
>> +                </s>
> 
> You forgot bundles (aka SneakerNet).
> Again, compared to everyone else it is "vastly superior" :)

Bundles and patches (peer review!) I think truly move it from "Good"
to "Excellent".

>>                  <s id="mercurial">
>>                      Excellent.  Uses HTTP or ssh.  Remote access also
>>                      works safely without locks over read-only network

By the way, can Git be used with repository on lockless network
filesystem? (Although with distributed SCM it perhaps be better
to just use many distributed repositories...). How does it work
with repository available via SMBFS / CIFS or NFS?

>> @@ -1203,6 +1266,10 @@ TODO:
>>                      Very good. Supports many UNIXes, Mac OS X, and Windows,
>>                      and is written in a portable language.
>>                  </s>
>> +                <s id="git">TO DO.
>> +                    Good.  Portable across all POSIX systems.
>> +                    There exists Win32 binary using MinGW.
>> +                </s>
> 
> "binaries": MinGW and Cygwin. And it is definitely "excellent" by the
> standards of the site.

I'd say excellent on POSIX systems, good on Win32 (there are still
as far as I remember some troubles). I hope that gitbox project would
succeed, and one would need only single binary (plus perhaps wish for
GUI, and DLLs) to use git on MS Windows.

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Junio C Hamano @ 2007-11-30  0:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Johannes Schindelin, Nguyen Thai Ngoc Duy, Jan Hudec,
	git
In-Reply-To: <alpine.LFD.0.9999.0711291527090.8458@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> And from a consistency standpoint, that would be a *good* thing. There are 
> many reasons why the git-xyz format *cannot* be the "consistent" form
> (ranging from the flags like --bare and -p to just aliases), so 
> encouraging people to move to "git xyz" is just a good idea.
>
> Yeah, yeah, the man-pages need the "git-xyz" form, but on the other hand, 
> rather than "man git-xyz", you can just do "git help xyz" instead, and now 
> you're consistently avoiding the dash again!

Ok.  So here is a revised roadmap that a panda brain (that is not so
well working today due to fever) came up.

 - v1.5.4 will ship with gitexecdir=$(bindir) in Makefile.  But the
   release notes for the version will warn users that:

   (1) using git-foo from the command line, and

   (2) using git-foo from your scripts without first prepending the
       return value of "git --exec-path" to the PATH

   is now officially deprecated (it has been deprecated for a long time
   since January 2006, v1.2.0~149) and upcoming v1.5.5 will ship with
   the default configuration that does not install git-foo form in
   user's PATH.

   If further will warn users that git-foo form will be removed in
   v1.5.6 for many commands and it will be merely an accident if some of
   them still work after that.

 - Post v1.5.4, start cooking gitexecdir=$(libexecdir)/git-core, aiming
   for inclusion in v1.5.5, perhaps in Feb-Mar 2008 timeframe.  This
   will also affect the sample RPM spec and resulting RPM binary
   packages I will place on k.org, and I'll ask Gerrit to do the same on
   Debian side.  The official binary packaging of individual distros are
   not under my control, but if there is a handy list of people I can
   send this notice to for other distros, that would help this process.

 - The release notes for v1.5.5 will warn users again that git-foo will
   be removed in v1.5.6 for many commands and it will be merely an
   accident if some of them still work.

 - Post v1.5.5, start cooking the change that does not install hardlinks
   for built-in commands, aiming for inclusion in v1.5.6, in May-Jun
   2008 timeframe.

^ permalink raw reply

* Re: Fix a pathological case in git detecting proper renames
From: Jeff King @ 2007-11-29 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kumar Gala, Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.9999.0711291442300.8458@woody.linux-foundation.org>

On Thu, Nov 29, 2007 at 03:03:06PM -0800, Linus Torvalds wrote:

> This would probably become easier to do with the linear-time hash-based 
> similarity engine (the stuff Jeff King was working on), but the way the 
> code is currently structured - with no incremental rename detection at 
> all, and with all the scoring in one global table - it's pretty painful.

I think it will get worse, because you are simultaneously calculating
all of the similarity scores bit by bit rather than doing a loop. Though
perhaps you mean at the end you will end up with a list of src/dst pairs
sorted by score, and you can loop over that.

-Peff

^ permalink raw reply

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Linus Torvalds @ 2007-11-29 23:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, Nguyen Thai Ngoc Duy, Junio C Hamano,
	Jan Hudec, git
In-Reply-To: <20071129231444.GA9616@coredump.intra.peff.net>



On Thu, 29 Nov 2007, Jeff King wrote:
> 
> Yes, I am fine with the user having to go to extra lengths to use the
> dash forms (like adding $(libexecdir) to their path), which I think
> should address your consistency concern.

I agree. If we actually start moving the subcommands into a separate 
directory, I suspect scripts will be fixed up soon enough. Of course 
people *can* do it by just adding the path, but more likely, we'll just 
see people start doign "git xyz" instead of "git-xyz".

And from a consistency standpoint, that would be a *good* thing. There are 
many reasons why the git-xyz format *cannot* be the "consistent" form
(ranging from the flags like --bare and -p to just aliases), so 
encouraging people to move to "git xyz" is just a good idea.

Yeah, yeah, the man-pages need the "git-xyz" form, but on the other hand, 
rather than "man git-xyz", you can just do "git help xyz" instead, and now 
you're consistently avoiding the dash again!

			Linus

^ permalink raw reply

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Jeff King @ 2007-11-29 23:14 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Nguyen Thai Ngoc Duy, Junio C Hamano, Jan Hudec, git
In-Reply-To: <Pine.LNX.4.64.0711292218240.27959@racer.site>

On Thu, Nov 29, 2007 at 10:19:16PM +0000, Johannes Schindelin wrote:

> > I think that is totally reasonable, as on those platforms there is
> > actually something to be gained from removing those hardlinks (you could
> 
> Note that one big problem with a few platforms having dash forms and 
> others not is that you _will_ get scripts and aliases that do not work 
> everywhere.
> 
> Consistency is good.

Yes, I am fine with the user having to go to extra lengths to use the
dash forms (like adding $(libexecdir) to their path), which I think
should address your consistency concern.

-Peff

^ permalink raw reply

* Re: Fix a pathological case in git detecting proper renames
From: Linus Torvalds @ 2007-11-29 23:03 UTC (permalink / raw)
  To: Kumar Gala, Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <alpine.LFD.0.9999.0711291303000.8458@woody.linux-foundation.org>



On Thu, 29 Nov 2007, Linus Torvalds wrote:
> 
> It's worth noting a few gotchas:
> 
>  - this scoring is currently only done for the "exact match" case. 
> 
>    In particular, in Kumar's example, even after this patch, the inexact
>    match case is still done as a copy+delete rather than as two renames:
> 
> 	 delete mode 100644 board/cds/mpc8555cds/u-boot.lds
> 	 copy board/{cds => freescale}/mpc8541cds/u-boot.lds (97%)
> 	 rename board/{cds/mpc8541cds => freescale/mpc8555cds}/u-boot.lds (97%)
> 
>    because apparently the "cds/mpc8541cds/u-boot.lds" copy looked 
>    a bit more similar to both end results. That said, I *suspect* we just 
>    have the exact same issue there - the similarity analysis just gave 
>    identical (or at least very _close_ to identical) similarity points, 
>    and we do not have any logic to prefer multiple renames over a 
>    copy/delete there.
> 
>    That is a separate patch.

Side note: just in case people were expecting me to actually _ship_ that 
separate patch that handles the fuzzy matches too.. I wasn't planning on 
doing that patch. The way the fuzzy rename detection is currently done, 
that's actually quite painful.

For the fuzzy rename detection, we generate the full score matrix, and 
sort it by the score, up front. So all the scoring - and more importantly, 
all the sorting - has actually been done before we actually start looking 
at *any* renames at all, so we cannot easily do the same thing I did for 
the exact renames, namely to take into account _earlier_ renames in the 
scoring. Because those earlier renames have simply not been done when the 
score is calculated.

This would probably become easier to do with the linear-time hash-based 
similarity engine (the stuff Jeff King was working on), but the way the 
code is currently structured - with no incremental rename detection at 
all, and with all the scoring in one global table - it's pretty painful.

			Linus

^ permalink raw reply

* Re: Adding push configuration to .git/config
From: Junio C Hamano @ 2007-11-29 22:46 UTC (permalink / raw)
  To: Nico -telmich- Schottelius; +Cc: Johannes Schindelin, Steffen Prohaska, git
In-Reply-To: <20071128221559.GC22395@denkbrett.schottelius.org>

Nico -telmich- Schottelius <nico-linux-git@schottelius.org> writes:

> ...
> [branch "otherbranch"]
>    merge = otherremote
>    push = otherremote
>    push = classmate
>    push = myremote
> --------------------------------------------------------------------------------
>
> What do you think about that approach?

Huh?

Is this a reinjection of an ancient message by some gateway?

You were already told branch.$name.merge has a defined meaning and
syntax, and you cannot make it refer to a remote shorthand without
breaking an existing setup.

Also if you want to have more than one destination repository for a
single push, I think that is already supported with remote.$name.url.

IIRC, there was a suggestion to enhance remote.$name configuration in
this way instead, so that you can use different URL for fetching and
pushing:

	[branch "foo"]
        remote = "there"
        merge = refs/heads/master

	[remote "there"]
        url = git://git.there.xz/repo.git
        push_url = git.there.xz:repo.git
        push_url = git.there.xz:backup.git
	fetch = refs/heads/*:refs/remotes/there/*

I further vaguely recall that the comments on the alternative were
positive (it might have been you who responded, or somebody else, I do
not remember).

^ permalink raw reply

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Junio C Hamano @ 2007-11-29 22:36 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Nguyen Thai Ngoc Duy, Jan Hudec, Johannes Schindelin, git
In-Reply-To: <alpine.LFD.0.99999.0711290905510.9605@xanadu.home>

Nicolas Pitre <nico@cam.org> writes:

> On Thu, 29 Nov 2007, Nguyen Thai Ngoc Duy wrote:
>
>> There won't be a stage when only porcelain git-foos are in $(bindir)?
>> I could stop working on the relevant patch then.
>
> Well, I personally found your effort really nice.  I think Junio is 
> overly cautious in this case, and I would prefer to see the number of 
> git commands in the default path drop rather sooner than later.

I agree with the first sentence.  And yes I am playing it safe, and at
the same time I do not think the "default" really matters as much as
people think.

If people are really serious about reducing the number of commands in
the path, I would expect fixes and bugreports saying "I am setting
gitexecdir different from bindir in _my_ installation when I build git,
and here are the things that does not work if I do so".  Within the span
of more than 20 months (77cb17e9 introduced gitexecdir in Jan 2006), I
do not think there was a single such report or patch, other than the
message from Nguyen that started this thread.

Which means one of two things (1) we got everything right and there is
nothing to fix, other than changing the default like Nguyen's patch
does, or (2) nobody is interested in moving git-foo out of their PATH
for _his_ own use, but pushing changes that would affect _other_ people
without testing.

I am of course hoping that (1) is the case.  And it could be that in
open-source settings often the silent majority is content with what's
already there, and that many people who are indeed interested in moving
git-foo out of their PATH are doing so happily without telling the
others of their success.

But it still worries me.

And people's scripts, especially old/unmaintained ones that google still
knows about, are worrysome too.  Didn't we just see a message that says
"git-update-cache in a script I picked up from google does not work" on
the list?

^ permalink raw reply

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Johannes Schindelin @ 2007-11-29 22:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Nguyen Thai Ngoc Duy, Junio C Hamano, Jan Hudec, git
In-Reply-To: <20071129211409.GA16625@sigill.intra.peff.net>

Hi,

On Thu, 29 Nov 2007, Jeff King wrote:

> On Fri, Nov 30, 2007 at 03:05:05AM +0700, Nguyen Thai Ngoc Duy wrote:
> 
> > > But I don't see a point to removing the links entirely. The annoyance
> > > factor for people who want git-* is much higher, and I don't see that it
> > > actually buys us any help for new users (who will no longer care after
> > > everything is hidden in $(libexecdir) anyway).
> > 
> > Maybe only not install hardlinks on systems that do not support it
> > like Windows? git.exe duplication takes a lot of space.
> 
> I think that is totally reasonable, as on those platforms there is
> actually something to be gained from removing those hardlinks (you could
> also of course make a very thin wrapper for "git-foo" that called "git
> foo"; it would still be wasteful, but not as much as copying the whole
> git.exe. But that is not worth doing unless people on Windows really
> want the dash forms).

Note that one big problem with a few platforms having dash forms and 
others not is that you _will_ get scripts and aliases that do not work 
everywhere.

Consistency is good.

Ciao,
Dscho

^ permalink raw reply

* importing bk into git
From: Christoph @ 2007-11-29 21:32 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

I am trying to import a BitKeeper repo into a (new) git repo.

I am trying with the script bk2git.py that I found on the web.
This does not quite work - I fear script is no longer working with the current 
git release. (I am using the current git release.)

If I have understood the script correctly, it does repeated bk checkouts and 
imports the updates the git repo diff of the (next) checkout etc.

It seems this script tries to do so by settings environment vars
GIT_OBJECT_DIRECTORY and GIT_INDEX_FILE
to point at the git repo.

The bk checkout are done at a temp. dir (tmp_dir).


The following lines fail
  os.system("cd %s; git-ls-files --deleted | xargs 
git-update-cache --remove" % tmp_dir)

with: fatal: Not a git repository
xargs: git-update-cache: No such file or directory

The problem seems to be that the script cd's into the temp dir (which is not a 
git repo) and the git-ls-files fails to find a git repo there.
I think the issue might be that an earlier version of git was perhaps able to 
find the repo by means of the env. vars mentioned above.

Any idea if/how I can fix this?
Thanks for any ideas and best regards

Christoph
(Sorry, my python and git skills are so far very limited.)

PS: I have attached the script I downloaded from the net.
-- 
FORTUNE'S PARTY TIPS		#14

Tired of finding that other people are helping themselves to your good
liquor at BYOB parties?  Take along a candle, which you insert and
light after you've opened the bottle.  No one ever expects anything
drinkable to be in a bottle which has a candle stuck in its neck.

[-- Attachment #2: bk2git.py --]
[-- Type: application/x-python, Size: 4700 bytes --]

^ permalink raw reply

* Fix a pathological case in git detecting proper renames
From: Linus Torvalds @ 2007-11-29 21:30 UTC (permalink / raw)
  To: Kumar Gala, Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <41CB0B7D-5AC1-4703-BA99-21622A410F93@kernel.crashing.org>



Kumar Gala had a case in the u-boot archive with multiple renames of files 
with identical contents, and git would turn those into multiple "copy" 
operations of one of the sources, and just deleting the other sources.

This patch makes the git exact rename detection prefer to spread out the 
renames over the multiple sources, rather than do multiple copies of one 
source.

NOTE! The changes are a bit larger than required, because I also renamed 
the variables named "one" and "two" to "target" and "source" respectively. 
That makes the logic easier to follow, especially as the "one" was 
illogically the target and not the soruce, for purely historical reasons 
(this piece of code used to traverse over sources and targets in the wrong 
order, and when we fixed that, we didn't fix the names back then. So I 
fixed them now).

The important part of this change is just the trivial score calculations 
for when files have identical contents:

	/* Give higher scores to sources that haven't been used already */
	score = !source->rename_used;
	score += basename_same(source, target);

and when we have multiple choices we'll now pick the choice that gets the 
best rename score, rather than only looking at whether the basename 
matched.

It's worth noting a few gotchas:

 - this scoring is currently only done for the "exact match" case. 

   In particular, in Kumar's example, even after this patch, the inexact
   match case is still done as a copy+delete rather than as two renames:

	 delete mode 100644 board/cds/mpc8555cds/u-boot.lds
	 copy board/{cds => freescale}/mpc8541cds/u-boot.lds (97%)
	 rename board/{cds/mpc8541cds => freescale/mpc8555cds}/u-boot.lds (97%)

   because apparently the "cds/mpc8541cds/u-boot.lds" copy looked 
   a bit more similar to both end results. That said, I *suspect* we just 
   have the exact same issue there - the similarity analysis just gave 
   identical (or at least very _close_ to identical) similarity points, 
   and we do not have any logic to prefer multiple renames over a 
   copy/delete there.

   That is a separate patch.

 - When you have identical contents and identical basenames, the actual 
   entry that is chosen is still picked fairly "at random" for the first 
   one (but the subsequent ones will prefer entries that haven't already 
   been used).

   It's not actually really random, in that it actually depends on the
   relative alphabetical order of the files (which in turn will have 
   impacted the order that the entries got hashed!), so it gives 
   consistent results that can be explained. But I wanted to point it out 
   as an issue for when anybody actually does cross-renames.

   In Kumar's case the choice is the right one (and for a single normal 
   directory rename it should always be, since the relative alphabetical 
   sorting of the files will be identical), and we now get:

	 rename board/{cds => freescale}/mpc8541cds/init.S (100%)
	 rename board/{cds => freescale}/mpc8548cds/init.S (100%)

   which is the "expected" answer. However, it might still be better to 
   change the pedantic "exact same basename" on/off choice into a more 
   graduated "how similar are the pathnames" scoring situation, in order 
   to be more likely to get the exact rename choice that people *expect* 
   to see, rather than other alternatives that may *technically* be 
   equally good, but are surprising to a human.

It's also unclear whether we should consider "basenames are equal" or 
"have already used this as a source" to be more important. This gives them 
equal weight, but I suspect we might want to just multiple the "basenames 
are equal" weight by two, or something, to prefer equal basenames even if 
that causes a copy/delete pair. I dunno.

Anyway, what I'm just saying in a really long-winded manner is that I 
think this is right as-is, but it's not the complete solution, and it may 
want some further tweaking in the future.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

On Thu, 29 Nov 2007, Kumar Gala wrote:
> 
> let me know if there is anything else you need.

No, this was all right, and I've already got a patch ready for you to try.

So this patch actually does do what you want (for the exact renames, if 
not for the u-boot.lds file), but I wanted to just point out that we will 
almost certainly at least want to extend it to the inexact rename 
detection logic too, _and_ we may well want to make the "score" 
calculation a bit more involved depending on the actual filename, rather 
than just depend on the equality of the basename.

		Linus

---
 diffcore-rename.c |   25 ++++++++++++++++---------
 1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index f9ebea5..f64294e 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -244,28 +244,35 @@ static int find_identical_files(struct file_similarity *src,
 	 * Walk over all the destinations ...
 	 */
 	do {
-		struct diff_filespec *one = dst->filespec;
+		struct diff_filespec *target = dst->filespec;
 		struct file_similarity *p, *best;
-		int i = 100;
+		int i = 100, best_score = -1;
 
 		/*
 		 * .. to find the best source match
 		 */
 		best = NULL;
 		for (p = src; p; p = p->next) {
-			struct diff_filespec *two = p->filespec;
+			int score;
+			struct diff_filespec *source = p->filespec;
 
 			/* False hash collission? */
-			if (hashcmp(one->sha1, two->sha1))
+			if (hashcmp(source->sha1, target->sha1))
 				continue;
 			/* Non-regular files? If so, the modes must match! */
-			if (!S_ISREG(one->mode) || !S_ISREG(two->mode)) {
-				if (one->mode != two->mode)
+			if (!S_ISREG(source->mode) || !S_ISREG(target->mode)) {
+				if (source->mode != target->mode)
 					continue;
 			}
-			best = p;
-			if (basename_same(one, two))
-				break;
+			/* Give higher scores to sources that haven't been used already */
+			score = !source->rename_used;
+			score += basename_same(source, target);
+			if (score > best_score) {
+				best = p;
+				best_score = score;
+				if (score == 2)
+					break;
+			}
 
 			/* Too many identical alternatives? Pick one */
 			if (!--i)

^ permalink raw reply related

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Jeff King @ 2007-11-29 21:14 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, Jan Hudec, Johannes Schindelin, git
In-Reply-To: <fcaeb9bf0711291205h125dadbbp8e8ae392e9b5b751@mail.gmail.com>

On Fri, Nov 30, 2007 at 03:05:05AM +0700, Nguyen Thai Ngoc Duy wrote:

> > But I don't see a point to removing the links entirely. The annoyance
> > factor for people who want git-* is much higher, and I don't see that it
> > actually buys us any help for new users (who will no longer care after
> > everything is hidden in $(libexecdir) anyway).
> 
> Maybe only not install hardlinks on systems that do not support it
> like Windows? git.exe duplication takes a lot of space.

I think that is totally reasonable, as on those platforms there is
actually something to be gained from removing those hardlinks (you could
also of course make a very thin wrapper for "git-foo" that called "git
foo"; it would still be wasteful, but not as much as copying the whole
git.exe. But that is not worth doing unless people on Windows really
want the dash forms).

-Peff

^ permalink raw reply

* [PATCH] Add "--expire <time>" option to 'git prune'
From: Johannes Schindelin @ 2007-11-29 20:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git, pasky
In-Reply-To: <7vlk8gvmts.fsf@gitster.siamese.dyndns.org>


Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.

This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time.  For example, by

	git prune --expire 14.days

you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).

The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	On Thu, 29 Nov 2007, Junio C Hamano wrote:

	> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
	> 
	> > The implementation uses st.st_mtime rather than st.st_ctime,
	> > because it can be tested better, using 'touch -d <time>' (and
	> > omitting the test when the platform does not support that
	> > command line switch).
	> 
	> But I think you can use more portable -t for setting mtime to
	> 1970/01/01, but I had a feeling that earlier we were bitten by
	> non-portability of "touch" and introduced test-chmtime.

	Somehow that slipped by me.  This patch uses test-chmtime.

 Documentation/git-prune.txt |    5 ++++-
 builtin-prune.c             |   21 ++++++++++++++++++++-
 t/t1410-reflog.sh           |   17 +++++++++++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index 0ace233..9835bdb 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -8,7 +8,7 @@ git-prune - Prune all unreachable objects from the object database
 
 SYNOPSIS
 --------
-'git-prune' [-n] [--] [<head>...]
+'git-prune' [-n] [--expire <expire>] [--] [<head>...]
 
 DESCRIPTION
 -----------
@@ -31,6 +31,9 @@ OPTIONS
 \--::
 	Do not interpret any more arguments as options.
 
+\--expire <time>::
+	Only expire loose objects older than <time>.
+
 <head>...::
 	In addition to objects
 	reachable from any of our references, keep objects
diff --git a/builtin-prune.c b/builtin-prune.c
index 44df59e..b5e7684 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -7,15 +7,24 @@
 
 static const char prune_usage[] = "git-prune [-n]";
 static int show_only;
+static unsigned long expire;
 
 static int prune_object(char *path, const char *filename, const unsigned char *sha1)
 {
+	const char *fullpath = mkpath("%s/%s", path, filename);
+	if (expire) {
+		struct stat st;
+		if (lstat(fullpath, &st))
+			return error("Could not stat '%s'", fullpath);
+		if (st.st_mtime > expire)
+			return 0;
+	}
 	if (show_only) {
 		enum object_type type = sha1_object_info(sha1, NULL);
 		printf("%s %s\n", sha1_to_hex(sha1),
 		       (type > 0) ? typename(type) : "unknown");
 	} else
-		unlink(mkpath("%s/%s", path, filename));
+		unlink(fullpath);
 	return 0;
 }
 
@@ -85,6 +94,16 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 			show_only = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--expire")) {
+			if (++i < argc) {
+				expire = approxidate(argv[i]);
+				continue;
+			}
+		}
+		else if (!prefixcmp(arg, "--expire=")) {
+			expire = approxidate(arg + 9);
+			continue;
+		}
 		usage(prune_usage);
 	}
 
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 12a53ed..4a17573 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -201,4 +201,21 @@ test_expect_success 'delete' '
 	! grep dragon < output
 '
 
+test_expect_success 'prune --expire' '
+
+	BLOB=$(echo aleph | git hash-object -w --stdin) &&
+	BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
+	test 20 = $(git count-objects | sed "s/ .*//") &&
+	test -f $BLOB_FILE &&
+	git reset --hard &&
+	git prune --expire=1.hour.ago &&
+	test 20 = $(git count-objects | sed "s/ .*//") &&
+	test -f $BLOB_FILE &&
+	test-chmtime -86400 $BLOB_FILE &&
+	git prune --expire 1.day &&
+	test 19 = $(git count-objects | sed "s/ .*//") &&
+	! test -f $BLOB_FILE
+
+'
+
 test_done
-- 
1.5.3.6.2088.g8c260

^ permalink raw reply related

* Re: problem with git detecting proper renames
From: Kumar Gala @ 2007-11-29 20:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <alpine.LFD.0.9999.0711291122050.8458@woody.linux-foundation.org>

> This is why I'd like to have a real-life example. I can change the
> heuristics, and I even know what are likely to be better heuristics,  
> but I
> still want to actually see and play with an example so that when I  
> send
> Junio a patch, I can explain it and say I've tested it with something
> real..

Ok, here's the tree:

         git.kernel.org:/pub/scm/boot/u-boot/galak/u-boot.git linus_git

and the commit that is doing the file movement is:

ba30ae3cc2e92d4b2362fbc01bedb659615e123e

let me know if there is anything else you need.

- k

^ permalink raw reply

* git-fetch ansi control sequences
From: Romain Francoise @ 2007-11-29 20:10 UTC (permalink / raw)
  To: git

Running 'git fetch' in a dumb terminal (like an Emacs shell buffer)
now outputs raw control sequences, which isn't particularly pretty:

| remote: Generating pack...^[[K
| remote: Done counting 4044 objects.^[[K
| remote: Result has 2577 objects.^[[K
| remote: Deltifying 2577 objects...^[[K
| remote:
| remote: Total 2577 (delta 2150), reused 2457 (delta 2044)^[[K

It would be easy to make recv_sideband() check if the terminal is
dumb, but I have a feeling that the sideband code shouldn't have to
know about such things, being rather generic.

What do people think?

^ permalink raw reply

* Re: Adding Git to Better SCM Initiative : Comparison
From: Alex Riesen @ 2007-11-29 20:07 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Robin Rosenberg
In-Reply-To: <200711290326.13822.jnareb@gmail.com>

Jakub Narebski, Thu, Nov 29, 2007 03:26:12 +0100:
> +                <s id="git">
> +                    Medium. There's Git User's Manual, manpages, some
> +                    technical documentation and some howtos.  All
> +                    documentation is also available online in HTML format;
> +                    there is additional information (including beginnings
> +                    of FAQ) on git wiki.
> +                    Nevertheles one of complaints in surveys is insufficient

"Nevertheless" (two "s").

BTW, I wouldn't call the level of documentation "Medium" when compared
to any commercial SCM. How can they earn more than "a little", when
compared to any opensource program?

> @@ -894,6 +938,14 @@ TODO:
>                      to install the subversion perl bindings and a few modules
>                      from CPAN.
>                  </s>
> +                <s id="git">
> +                    TO DO. RPMs and deb packages for Linux. msysGit and
> +                    Cygwin for Win32 - Git requires POSIX shell, Perl,
> +                    and POSIX utilities for some commands (builtin).

I read this as: "Git requires all these programs for builtin
commands". Which is a bit confusing. Just drop "(builtin)"?

> +                    Autoconf to generate Makefile configuration; ready
> +                    generic configuration for many OS. Compiling docs
> +                    requires asciidoc and xmlto toolchain, but prebuild.

"prebuilt" (with "t"). Maybe remove ", but prebuilt" completely?

> @@ -1106,6 +1165,10 @@ TODO:
>                      There exists some HTTP-functionality, but it is quite
>                      limited.
>                  </s>
> +                <s id="git">
> +                    Good.  Uses HTTPS (with WebDAV) or ssh for push,
> +                    HTTP, FTP, ssh or custom protocol for fetch.
> +                </s>

You forgot bundles (aka SneakerNet).
Again, compared to everyone else it is "vastly superior" :)

>                  <s id="mercurial">
>                      Excellent.  Uses HTTP or ssh.  Remote access also
>                      works safely without locks over read-only network
> @@ -1203,6 +1266,10 @@ TODO:
>                      Very good. Supports many UNIXes, Mac OS X, and Windows,
>                      and is written in a portable language.
>                  </s>
> +                <s id="git">TO DO.
> +                    Good.  Portable across all POSIX systems.
> +                    There exists Win32 binary using MinGW.
> +                </s>

"binaries": MinGW and Cygwin. And it is definitely "excellent" by the
standards of the site.

^ permalink raw reply

* Re: [PATCH] Add "--expire <time>" option to 'git prune'
From: Junio C Hamano @ 2007-11-29 20:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Johannes Sixt, git, pasky
In-Reply-To: <Pine.LNX.4.64.0711291419350.27959@racer.site>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> The implementation uses st.st_mtime rather than st.st_ctime,
> because it can be tested better, using 'touch -d <time>' (and
> omitting the test when the platform does not support that
> command line switch).

But I think you can use more portable -t for setting mtime to
1970/01/01, but I had a feeling that earlier we were bitten by
non-portability of "touch" and introduced test-chmtime.

^ permalink raw reply

* Re: [PATCH] Move all dashed form git commands to libexecdir
From: Nguyen Thai Ngoc Duy @ 2007-11-29 20:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jan Hudec, Johannes Schindelin, git
In-Reply-To: <20071129150849.GA32296@coredump.intra.peff.net>

On Nov 29, 2007 10:08 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Nov 28, 2007 at 03:14:56PM -0800, Junio C Hamano wrote:
>
> >  - Post v1.5.5, start cooking the change that does not install hardlinks
> >    for built-in commands, aiming for inclusion in v1.6.0, by the end of
> >    2008.
>
> I am against this, unless it is configurable. I think the goal of
> reducing user-visible commands is fine, and moving things to
> $(libexecdir) is a good way of doing that.
>
> However, I personally still think the 'git-foo' forms are valuable
> (because fingers have already been trained, and because
> non-bash-programmable completions understand them). And I don't mind
> putting $(libexecdir)/git-core in my PATH to retain this behavior; it's
> a one-time configuration tweak, and it helps new users with the
> overwhelming command set.
>
> But I don't see a point to removing the links entirely. The annoyance
> factor for people who want git-* is much higher, and I don't see that it
> actually buys us any help for new users (who will no longer care after
> everything is hidden in $(libexecdir) anyway).

Maybe only not install hardlinks on systems that do not support it
like Windows? git.exe duplication takes a lot of space.
-- 
Duy

^ permalink raw reply

* [PATCH] git-svn: Don't create a "master" branch every time rebase is run
From: Steven Grimm @ 2007-11-29 19:54 UTC (permalink / raw)
  To: git

If you run "git-svn rebase" while sitting on a topic branch, there is
no need to create a "master" branch if one didn't exist already. The
branch was created implicitly by the automatic checkout after fetching,
which in the case of rebase isn't actually necessary anyway.

Signed-off-by: Steven Grimm <koreth@midwinter.com>
---
 git-svn.perl |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 43e1591..d483e6b 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -545,6 +545,8 @@ sub cmd_rebase {
 		exit 1;
 	}
 	unless ($_local) {
+		# rebase will checkout for us, so no need to do it explicitly
+		$_no_checkout = 'true';
 		$_fetch_all ? $gs->fetch_all : $gs->fetch;
 	}
 	command_noisy(rebase_cmd(), $gs->refname);
-- 
1.5.3.6.960.g49661

^ permalink raw reply related

* Re: problem with git detecting proper renames
From: Kumar Gala @ 2007-11-29 19:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <alpine.LFD.0.9999.0711291122050.8458@woody.linux-foundation.org>


On Nov 29, 2007, at 1:27 PM, Linus Torvalds wrote:

>
>
> On Thu, 29 Nov 2007, Kumar Gala wrote:
>>
>> In the case of multiple identical matches can we look at the file  
>> name as a
>> possible heuristic?
>
> We already do. But we only do the base-name part and check it for
> exactness, since moving across directories is very common, and we
> explicitly want to pick up files that have the same base name.
>
> However, in your case, not only did you have the same content, you  
> had the
> same basename too! So git considered your renames to be totally  
> identical
> wrt scoring with the current heuristics, and just picked one source at
> random.
>
> And the current heuristics don't even have any "if you already found a
> rename, avoid picking the same one twice", so it would pick the *same*
> source both times, which is why it looked like "two copies and one
> delete".
>
> This is why I'd like to have a real-life example. I can change the
> heuristics, and I even know what are likely to be better heuristics,  
> but I
> still want to actually see and play with an example so that when I  
> send
> Junio a patch, I can explain it and say I've tested it with something
> real..

Ok, this is a real example from the u-boot tree.  If you give me a  
little while I can point you at a kernel.org git tree that showed this  
issue.

- k

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox