Git development

Git development
 help / color / mirror / Atom feed

* Re: Selecting the minor revs
From: sean @ 2006-03-28  0:18 UTC (permalink / raw)
  To: Greg Lee; +Cc: git
In-Reply-To: <0e7d01c651fb$fa11ceb0$a100a8c0@casabyte.com>

On Mon, 27 Mar 2006 19:10:09 -0500
"Greg Lee" <glee@swspec.com> wrote:

> > If you're interested in the stable-series releases of the 
> > kernel, unfortunately they're not in the git repository.
> 
> As I feared ... I'm curious, why?

Because the stable-series is maintained by people other than Linus.   

They may have their own git tree, i'm not sure.  Even if they don't, 
you could create a stable-series branch and import the patches
into your git repo if it was something you needed often.

Sean

^ permalink raw reply

* RE: Problem with git bisect between 2.6.15 and 2.6.16
From: Greg Lee @ 2006-03-28  0:16 UTC (permalink / raw)
  To: 'sean', git
In-Reply-To: <BAYC1-PASMTP036F0DBE8F7101BCAD5470AED30@CEZ.ICE>

> You need to do the bisect start after you cd into the linux-git 
> directory.

Sorry, cut and paste error, I did the cd before the bisect:

[root@Fedora-test git]# git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git-fresh
[root@Fedora-test git]# cd linux-git-fresh/
[root@Fedora-test linux-git-fresh]# git bisect start
[root@Fedora-test linux-git-fresh]# git bisect bad v2.6.15
[root@Fedora-test linux-git-fresh]# git bisect good v2.6.16
dab47a31f42a23d2b374e1cd7d0b797e8e08b23d was both good and bad

> Also, it appears you have the good and bad reversed,
> presumably the newer (v2.6.16) is bad, and the older (v.2.6.15)
> is good.

No, the problem was fixed in 2.6.16 and I'm trying to figure out what fixed it so that I
can back-port the fix into a previous kernel version, so 2.6.16 is good and 2.6.15 is bad.

Greg

^ permalink raw reply

* RE: Selecting the minor revs
From: Greg Lee @ 2006-03-28  0:10 UTC (permalink / raw)
  To: 'sean', git
In-Reply-To: <BAYC1-PASMTP12827905B389678EB07BDAAED30@CEZ.ICE>

> If you're interested in the stable-series releases of the 
> kernel, unfortunately they're not in the git repository.

As I feared ... I'm curious, why?

Greg

^ permalink raw reply

* Re: Problem with git bisect between 2.6.15 and 2.6.16
From: sean @ 2006-03-28  0:06 UTC (permalink / raw)
  To: Greg Lee; +Cc: git
In-Reply-To: <0e7301c651fa$9e0fd770$a100a8c0@casabyte.com>

On Mon, 27 Mar 2006 19:00:25 -0500
"Greg Lee" <glee@casabyte.com> wrote:

> I get the following when I try to git bisect between 2.6.15 and 2.6.16:
>  
> [root@Fedora-test tmp]# git clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git
> [root@Fedora-test linux-git]# git bisect start
> [root@Fedora-test linux-git]# cd linux-git
> [root@Fedora-test linux-git]# git bisect good v2.6.16
> [root@Fedora-test linux-git]# git bisect bad v2.6.15
> dab47a31f42a23d2b374e1cd7d0b797e8e08b23d was both good and bad
> 
> What is the proper method to do a bisect between 2.6.15 and 2.6.16?
> 

You need to do the bisect start after you cd into the linux-git 
directory.   Also, it appears you have the good and bad reversed,
presumably the newer (v2.6.16) is bad, and the older (v.2.6.15)
is good.

Sean

^ permalink raw reply

* Re: Selecting the minor revs
From: sean @ 2006-03-28  0:02 UTC (permalink / raw)
  To: Greg Lee; +Cc: git
In-Reply-To: <0e6701c651f9$2605aad0$a100a8c0@casabyte.com>

On Mon, 27 Mar 2006 18:49:53 -0500
"Greg Lee" <glee@swspec.com> wrote:

> How do I select one of the "minor" bug fix revs using git?  For example I want to do a git
> bisect between 2.6.15.6 and 2.6.16 but I cannot determine what the naming convention is
> for "2.6.15.6".  I've tried "v2.6.15.6" and "v2.6.15-6".
>  
> Please cc any responses.

If you're interested in the stable-series releases of the kernel, unfortunately they're
not in the git repository.   On the otherhand if you're actually talking about the 
release candidates that Linus puts out before each new major version, the format is  
v2.6.16-rc2, v2.6.15-rc3  etc..

Sean

^ permalink raw reply

* Problem with git bisect between 2.6.15 and 2.6.16
From: Greg Lee @ 2006-03-28  0:00 UTC (permalink / raw)
  To: git

I get the following when I try to git bisect between 2.6.15 and 2.6.16:
 
[root@Fedora-test tmp]# git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git
[root@Fedora-test linux-git]# git bisect start
[root@Fedora-test linux-git]# cd linux-git
[root@Fedora-test linux-git]# git bisect good v2.6.16
[root@Fedora-test linux-git]# git bisect bad v2.6.15
dab47a31f42a23d2b374e1cd7d0b797e8e08b23d was both good and bad

What is the proper method to do a bisect between 2.6.15 and 2.6.16?

Greg

^ permalink raw reply

* Selecting the minor revs
From: Greg Lee @ 2006-03-27 23:49 UTC (permalink / raw)
  To: git

How do I select one of the "minor" bug fix revs using git?  For example I want to do a git
bisect between 2.6.15.6 and 2.6.16 but I cannot determine what the naming convention is
for "2.6.15.6".  I've tried "v2.6.15.6" and "v2.6.15-6".

Please cc any responses.

Thanks,
Greg Lee

^ permalink raw reply

* Re: Following renames
From: Petr Baudis @ 2006-03-27 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ryan Anderson, git
In-Reply-To: <20060326232649.GV18185@pasky.or.cz>

Dear diary, on Mon, Mar 27, 2006 at 01:26:49AM CEST, I got a letter
where Petr Baudis <pasky@suse.cz> said that...
> To quickly see what it does, you can try it e.g. on the git-log.sh file
> in the Git repository.

By the way, the cg-log in master uses it now to automagically follow
file renames (in case you call it per-file like cg-log FILENAME). If you
hate it, you can prevent it by cg-log --no-renames (cg-log -R).

Looks pretty slick.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* [PATCH] cogito: Push tags over http
From: Dennis Stosberg @ 2006-03-27 19:12 UTC (permalink / raw)
  To: git


A trivial patch for cg-push allows to push tags over http.

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>

---

 cg-push |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

e9540d5f524c54102a93570031fb59156cec4188
diff --git a/cg-push b/cg-push
index b6b8954..4332b28 100755
--- a/cg-push
+++ b/cg-push
@@ -70,7 +70,7 @@ sprembranch=":refs/heads/$rembranch"
 
 if [ "${uri#http://}" != "$uri" -o "${uri#https://}" != "$uri" ]; then
 	# git-http-push doesn't like $sprembranch
-	git-http-push "$uri/" "$locbranch:$rembranch"
+	git-http-push "$uri/" "$locbranch:$rembranch" "${tags[@]}"
 elif [ "${uri#git+ssh://}" != "$uri" ]; then
 	send_pack_update "$name" "$(echo "$uri" | sed 's#^git+ssh://\([^/]*\)\(/.*\)$#\1:\2#')" "$locbranch$sprembranch" "${tags[@]}"
 elif [ "${uri#rsync://}" != "$uri" ]; then
-- 
1.2.GIT

^ permalink raw reply related

* Re: [PATCH] Reintroduce svn pools to solve the memory leak.
From: Junio C Hamano @ 2006-03-27 18:16 UTC (permalink / raw)
  To: Santi Béjar; +Cc: Jan-Benedict Glaw, git, Karl Hasselström
In-Reply-To: <8aa486160603270326i3a8ddcfau61ca84cdac036ff9@mail.gmail.com>

"Santi Béjar" <sbejar@gmail.com> writes:

> On 3/24/06, Santi Béjar <sbejar@gmail.com> wrote:
>> Jan-Benedict Glaw <jbglaw@lug-owl.de> writes:
>>
>> diff-tree 4802426... (from 525c0d7...)
>> Author: Karl  Hasselström <kha@treskal.com>
>> Date:   Sun Feb 26 06:11:27 2006 +0100
>>
>>     svnimport: Convert executable flag
>>
>>     Convert the svn:executable property to file mode 755 when converting
>>     an SVN repository to GIT.
>>
>>     Signed-off-by: Karl Hasselström <kha@treskal.com>
>>     Signed-off-by: Junio C Hamano <junkio@cox.net>
>>
>> :100755 100755 ee2940f... 6603b96... M  git-svnimport.perl
>
> And this patch fixes my problems.

Jan-Benedict, thanks for pinpointing the regression, and Santi,
thanks for the patch.

I should have looked a bit more closely when applying the patch
-- it is clear that the patch is doing more than what its log
says.  My fault.

Karl, were there other reasons you needed to disable the pool
here (maybe to work around a problem with incompatible version
of SVN module)?  I see some other uses of SVN::Pool still there
in the code, so I am assuming this was a simple typo, but just
in case...

^ permalink raw reply

* Re: git-svn name
From: Chris Wright @ 2006-03-27 17:48 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Gerrit Pape, Chris Wright
In-Reply-To: <20060326030425.GA6306@hand.yhbt.net>

* Eric Wong (normalperson@yhbt.net) wrote:
> Would distro package maintainers also be willing to add my git-svn
> script to their git-svn binary packages when a new release of git is
> made, too?  It's quite different from git-svnimport (see
> contrib/git-svn/git-svn.txt for details).

I think your script name is fine.  Best way to handle this is with a
patch to make your git-svn part of the git-svn packaging.

thanks,
-chris

^ permalink raw reply

* Re: Following renames
From: Linus Torvalds @ 2006-03-27 16:52 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Jakub Narebski, git
In-Reply-To: <e5bfff550603270319w20796918wc8f8fe30a6c5627@mail.gmail.com>

On Mon, 27 Mar 2006, Marco Costalba wrote:
> >
> > And that's the point. Almost always, we're interested in the _recent_
> > stuff. The fact that it takes longer to get the old history  is not very
> > important. You generally don't ask "what changed in this file" for a file
> > that hasn't changed in five years.
> 
> We could run git-rev-list with a time range specifier (changes of last
> year as example) by default so to have fast results and run all time
> history _only_  on request.

Yes.

However, what I've been meaning to do (but just haven't had the time and 
energy for so far) is to fix "git-rev-list" with a path limiter.

Right now that always causes things to be totally serialized, and the 
revision walking will first look up _all_ the history (well, it will prune 
out the merges) before starting to output stuff.

So right now in order for "git-whatchanged" to be fast and incremental, it 
doesn't do any path limiting with git-rev-list at ALL, and does it all in 
git-diff-tree. Which is horrid.

> I still think the problem with annotation is that you don't see
> patches that _remove_ lines of code, you need the whole diff for this.

Well, that's just another reason "annotate" sucks.

If you select a range of lines, my suggested tool _would_ show you lines 
that got removed there, and git-whatchanged does it quite well.

I really think "annotate" is _fundamentally_ a broken operation. It's not 
what any sane developer actually wants, and it has serious limitations (ie 
it depends on whole history, and it cannot show removals well).

		Linus

^ permalink raw reply

* Re: Following renames
From: Andreas Ericsson @ 2006-03-27 12:27 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Linus Torvalds, Jakub Narebski, git
In-Reply-To: <e5bfff550603270355s4b71c306hb4cb2b96eafd0f6e@mail.gmail.com>

Marco Costalba wrote:
> On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote:
> 
>>In contrast, git-whatchanged will start outputting the recent changes
>>immediately.
>>
> 
> 
> To integrate git-whatchanged like functionality with filter on a
> specific code region, the Linus original request, I am wondering about
> something like this:
> 
> A new git-diff-tree option --range=a..b to delimit a region,
> identified by code lines bounduaries.
> 

Make it --line-range if you implement this. On a first glance I thought 
you meant --commit-range.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: Following renames
From: Marco Costalba @ 2006-03-27 11:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0603270005330.15714@g5.osdl.org>

On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> In contrast, git-whatchanged will start outputting the recent changes
> immediately.
>

To integrate git-whatchanged like functionality with filter on a
specific code region, the Linus original request, I am wondering about
something like this:

A new git-diff-tree option --range=a..b to delimit a region,
identified by code lines bounduaries.

As example

git-diff-tree --range=10..15 HEAD -- <path>

Coud give these answers, added to standard git-diff-tree output:

* 10..25 --> modified region new region bounduaries are lines from 10 to 25

  15..20 --> region _NOT_ modified but new region bounduaries are
lines from 15 to 20 (perhaps patch added 5 lines _before_ the region)

  10..15  ---> region _NOT_ modified and lines, if any, added/removed 
_after_ the region

* 10..15 --> modified region with the same boundiaries (as example
removing trailing witespaces)

With this new option of git-diff-tree becames very simple to retrieve
a file history limited to a region, because the region bounduaries in
ouput from one rev are feed as input in parent rev.

Comments?

Marco

^ permalink raw reply

* Re: Following renames
From: Johannes Schindelin @ 2006-03-27 11:30 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git
In-Reply-To: <e5bfff550603270319w20796918wc8f8fe30a6c5627@mail.gmail.com>

Hi,

On Mon, 27 Mar 2006, Marco Costalba wrote:

> I still think the problem with annotation is that you don't see
> patches that _remove_ lines of code, you need the whole diff for this.

Interesting. You'd need a "git-emalb" (blame, but reverse), and you'd need 
to tell it a range "rev1..rev2" which is *not* to be interpreted as "^rev1 
rev2" but as a direct path from rev1 to rev2.

Ciao,
Dscho

^ permalink raw reply

* [PATCH] Reintroduce svn pools to solve the memory leak.
From: Santi Béjar @ 2006-03-27 11:26 UTC (permalink / raw)
  To: Jan-Benedict Glaw; +Cc: git, Junio C Hamano

On 3/24/06, Santi Béjar <sbejar@gmail.com> wrote:
> Jan-Benedict Glaw <jbglaw@lug-owl.de> writes:
>
> > On Wed, 2006-03-22 14:33:37 +0100, Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:
> >
> > Since it seems nobody looked at the GCC import run (which means to use
> > the svnimport), I ran it again, under strace control:
> >
> >> GCC
> >> ~~~
> >> $ /home/jbglaw/bin/git svnimport -C gcc -v svn://gcc.gnu.org/svn/gcc
> >
> >> Committed change 3936:/ 1993-03-31 05:44:03)
> >> Commit ID ceff85145f8671fb2a9d826a761cedc2a507bd1e
> >> Writing to refs/heads/origin
> >> DONE: 3936 origin ceff85145f8671fb2a9d826a761cedc2a507bd1e
> >> ... 3937 trunk/gcc/final.c ...
> >> Can't fork at /home/jbglaw/bin/git-svnimport line 379.
> >
>
> I have the same (?) problem with one of my svn repository. It worked
> before (I've redone the import with the -r flag), so I bisected it.
> The problematic commit seems to be:
>
> diff-tree 4802426... (from 525c0d7...)
> Author: Karl  Hasselström <kha@treskal.com>
> Date:   Sun Feb 26 06:11:27 2006 +0100
>
>     svnimport: Convert executable flag
>
>     Convert the svn:executable property to file mode 755 when converting
>     an SVN repository to GIT.
>
>     Signed-off-by: Karl Hasselström <kha@treskal.com>
>     Signed-off-by: Junio C Hamano <junkio@cox.net>
>
> :100755 100755 ee2940f... 6603b96... M  git-svnimport.perl
>
> I think it has a memory leak, it used up to 140m of memory.
>
> $ git reset --hard 4802426^
> $ time ../git-svnimport.perl file:///path/
> Use of uninitialized value in string eq at ../git-svnimport.perl line 463.
> Use of uninitialized value in substitution (s///) at ../git-svnimport.perl line 466.
> real    0m55.801s
> user    0m30.578s
> sys     0m23.084s
>
> $ git reset --hard 4802426
> $ time ../git-svnimport.perl file:///path/
> Use of uninitialized value in string eq at ../git-svnimport.perl line 463.
> Use of uninitialized value in substitution (s///) at ../git-svnimport.perl line 466.
> Can't fork at /home/santi/usr/src/scm/git/git-svnimport.perl line 331.
> real    6m2.163s
> user    0m20.332s
> sys     0m50.180s
>
> and it didn't finished. Hope it helps.

And this patch fixes my problems.

---

Introduced in 4802426.

Signed-off-by: Santi Béjar <sbejar@gmail.com>
---
 git-svnimport.perl |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/git-svnimport.perl b/git-svnimport.perl
index 639aa41..f2cf062 100755
--- a/git-svnimport.perl
+++ b/git-svnimport.perl
@@ -135,8 +135,10 @@

        print "... $rev $path ...\n" if $opt_v;
        my (undef, $properties);
+       my $pool = SVN::Pool->new();
        eval { (undef, $properties)
-                  = $self->{'svn'}->get_file($path,$rev,$fh); };
+                  = $self->{'svn'}->get_file($path,$rev,$fh,$pool); };
+       $pool->clear;
        if($@) {
                return undef if $@ =~ /Attempted to get checksum/;
                die $@;

^ permalink raw reply related

* Re: Following renames
From: Marco Costalba @ 2006-03-27 11:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0603270005330.15714@g5.osdl.org>

On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 27 Mar 2006, Marco Costalba wrote:
> >
> > Historic Linux test (63428 revisions)
> >
> > File: drivers/net/tg3.c
> > Revisions that modify tg3.c : 292
> >
> > With qgit
> > 15s to retrieve file history (git-rev-list)
> > 19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one)
>
> .. and it does absolutely _nothing_ while it's doing that, does it?
>

yes, it's true.

> > $ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null
> > 98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (797major+43033minor)pagefaults 0swaps
>
> In contrast, git-whatchanged will start outputting the recent changes
> immediately.
>
> And that's the point. Almost always, we're interested in the _recent_
> stuff. The fact that it takes longer to get the old history  is not very
> important. You generally don't ask "what changed in this file" for a file
> that hasn't changed in five years.
>

We could run git-rev-list with a time range specifier (changes of last
year as example) by default so to have fast results and run all time
history _only_  on request.

This perhaps could solve the fast output for recent revs problem, if
this is the problem.

I still think the problem with annotation is that you don't see
patches that _remove_ lines of code, you need the whole diff for this.

Marco

^ permalink raw reply

* Re: Following renames
From: Linus Torvalds @ 2006-03-27  8:07 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Jakub Narebski, git
In-Reply-To: <e5bfff550603262147t3aec8da6p6bf2a333e2d35f1d@mail.gmail.com>

On Mon, 27 Mar 2006, Marco Costalba wrote:
> 
> Historic Linux test (63428 revisions)
> 
> File: drivers/net/tg3.c
> Revisions that modify tg3.c : 292
> 
> With qgit
> 15s to retrieve file history (git-rev-list)
> 19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one)

.. and it does absolutely _nothing_ while it's doing that, does it?

> $ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null
> 98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (797major+43033minor)pagefaults 0swaps

In contrast, git-whatchanged will start outputting the recent changes 
immediately.

And that's the point. Almost always, we're interested in the _recent_ 
stuff. The fact that it takes longer to get the old history  is not very 
important. You generally don't ask "what changed in this file" for a file 
that hasn't changed in five years.

		Linus

^ permalink raw reply

* Re: Following renames
From: Jakub Narebski @ 2006-03-27  7:53 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.62.0603262337580.26865@qynat.qvtvafvgr.pbz>

David Lang wrote:

> On Mon, 27 Mar 2006, Jakub Narebski wrote:
> 
>> 2.) Caching the results of similarity algorithm/rename detection tool
>> (also Paul Jakma post), including remembering false positives and
>> undetected renames, for efficiency. Calculated automatically parts might
>> be throw-away.
> 
> this sounds like it could easily devolve into a O(n!) situation where you
> are cacheing how everything is related (or not related) to everything
> else. Paul was makeing the point that the purpose was to cache the data to
> eliminate the time needed to calculate it, but if you don't store all the
> results then you don't know if the result is not relavent, or unknown, so
> you need to calculate it again.

First of all, you only remember non-trivial relations (i.e. file.c is always
related to file.c). If the cache would be only for commits, it would be
O(c*p*n), where c is number of commits, p is percentage of contents moving
("renames") times percent of files changed in the commit, and n is the
number of files, probably O(c) practically. Even if we remember for all
(tree1,tree2) pairs it would be O(c^2). Additionally cache can be limited
in size (pruning oldest cache).  

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Following renames
From: David Lang @ 2006-03-27  7:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e0827k$7tk$1@sea.gmane.org>

On Mon, 27 Mar 2006, Jakub Narebski wrote:

> 2.) Caching the results of similarity algorithm/rename detection tool (also
> Paul Jakma post), including remembering false positives and undetected
> renames, for efficiency. Calculated automatically parts might be
> throw-away.

this sounds like it could easily devolve into a O(n!) situation where you 
are cacheing how everything is related (or not related) to everything 
else. Paul was makeing the point that the purpose was to cache the data to 
eliminate the time needed to calculate it, but if you don't store all the 
results then you don't know if the result is not relavent, or unknown, so 
you need to calculate it again.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply

* Re: Following renames
From: Junio C Hamano @ 2006-03-27  7:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.64.0603261509320.15714@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> No. "--sparse" still removes the uninteresting parents of merges. It just 
> doesn't then make the linear history any denser.

Hmph, you are right.  add_parents_to_list() calls prune_fn
unconditionally while running limit_list().

Disabling that with yet another flag might be a possibility but
I suspect then it would not be much different from running
rev-list without path limiter and having the caller process the
result.

^ permalink raw reply

* Re: Following renames
From: Jakub Narebski @ 2006-03-27  6:55 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0603260947100.15714@g5.osdl.org>

Linus Torvalds wrote:

> On Sun, 26 Mar 2006, Jakub Narebski wrote:
>> 
>> If (2) is common enough then discussed improvements to rename detection,
>> namely comparing basenames as a base for candidate selection is a good
>> idea.
> 
> BK had this "renametool" which got started automatically when you applied
> a patch that removed one or more files and added one or more files, so
> that you could then pair up the files manually.
[...]
> The thing is, the fast rename detection that is in the "next" branch
> really does a lot better, and it's fast enough.

I was thinking about the fast ename detection algorithm in "next" branch.

That is the question if recording additional (helper) information about
contents copying and moving like the mentioned "renametool" did is worth
the effort, both in coding it and from user's point of view. Or would
better contents copying and moving detection ("renames detection") for
whatchanged and similar suffice.

I am of opinion that voluntary information about contents moving and copying
in the commits would help.

Purposes:
1.) Record contents moving and similarity information which cannot or cannot
be easily calculated; see Paul Jakma response in this thread
  MessageID: <Pine.LNX.4.64.0603270642090.5276@sheen.jakma.org>
for example copying fragment of code, small fragment of the whole file,
creating documentation or header file from code, or code skeleton from
template, or rewrite of code in different language (e.g. shell script to
perl, script to compiled code e.g. Perl or Python to C).
2.) Caching the results of similarity algorithm/rename detection tool (also
Paul Jakma post), including remembering false positives and undetected
renames, for efficiency. Calculated automatically parts might be
throw-away.

Sources of information:
1.) Manually entered information *at commit*, including *-rm, *-mv, *-cp
like commands (which nobody likes) and systematized (pseudolanguage?) for
copying and moving contents in the log messages.
2.) Semi-manual tools like the mentioned "renametool" of BK.
3.) Support from editor (remebering where copied and pasted, or cut and
pasted fragment came from, and providing prefilled command to record
contents moving ("renames") or prefilled commit log containing this
information. Hard to get, probably most useful.
4.) Information from resolved merges and results of diagnosis (pickaxe like)
tools, especially recording "renames" which were not detected, and removing
"renames" which were detected falsily.  

Is that the place where I should provide code (patch) for testing the
idea :) ?

>> I wonder how common is (2) compared to (1)+(2) i.e. move to other dir
>> and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c
>
> For example, one common case was a directory structure like
> 
> ..
> type-file1.c
> type-file2.c
> otherfiles.c
> yet-more.c
> ..
> 
> being split up into a subdirectory
> 
> ..
> type/file1.c
> type/file2.c
> otherfiles.c
> yet-more.c
> ..
> 
> (eg drivers/scsi/aic7xx-* being given a subdirectory of it's own, as
> drivers/scsi/aic7xx/*). So the basename wouldn't stay the same, because it
> contained some piece of data that became redundant with the move.

Perhaps fast rename detection algorithm needs some smart similarity estimate
for names, which would put more weight in the parts closer to basename, and
would detect */type-file1.c and */type/file1.c as similar.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Following renames
From: Junio C Hamano @ 2006-03-27  6:46 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git
In-Reply-To: <e5bfff550603262147t3aec8da6p6bf2a333e2d35f1d@mail.gmail.com>

"Marco Costalba" <mcostalba@gmail.com> writes:

> NOTE: It seems that  git-whatchanged asks for checked the out file to
> work. It didn't work with no repository checked out.

Perhaps,

	$ git-whatchanged HEAD -- drivers/net/tg3.c

as Linus explained in a separate message today...

^ permalink raw reply

* Re: Following renames
From: Paul Jakma @ 2006-03-27  6:00 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e05354$cc9$1@sea.gmane.org>

On Sun, 26 Mar 2006, Jakub Narebski wrote:

> I think one of the better ideas/suggestions about *recording* filenames was
> in the "impure renames / history tracking" thread
> http://marc.theaimsgroup.com/?l=git&m=114122175216489&w=2
> <Pine.LNX.4.64.0603011343170.13612@sheen.jakma.org>

For the record, the responses I received were educational ;). 
Sufficiently so I no longer think renames should be recorded. At 
least, definitely not as renames.

I now grok the reasoning for doing it by 'similarity' - it is indeed 
a *much* more useful concept. (E.g. the 'pickaxe' idea people keep 
alluding though sounds amazingly useful).

So the question really is what, if any, weaknesses does the current 
similarity estimation have, and how to solve them. I can think of two 
weaknesses:

1. the similarity algorithms can be expensive potentially, and they
    essentially get run a lot with the same inputs, to produce the
    same results - over and over as one works with a git repo. (there
    was a thread a while ago on this I think).

2. Some 'similarities' are just not deducible by current software
    state of the art. E.g. where some code is rewritten in another
    language:

 	foo.X -> foo.Y

    The high-level algorithms may remain the exact same, but the code
    may be unrecognisable as similar except to a human. However,
    tracking history back across this rewrite probably would still be
    valuable to the human.

So I think what /might/ be interesting is to have a 'similarity 
cache', which would help 1, and to allow for manual injection of such 
hints (into a seperate and stronger cache most likely) - which would 
help 2.

Something to record the following information:

(tree1,tree2)[1]:
 	Id1 <-> Id1'
 	.
 	.
 	.
 	Idn <-> Idn'

That would allow:

1. Performance repercussions of similarity estimation to be one-time,
    cached there-after. (throw-away information, if a better
    similarity estimation heuristic comes along, you can rebuild this
    cache)

2. The user to inject their own 'hints' into similarity estimation,
    particularly for cases that just aren't obvious and probably never
    will be to software estimators (e.g. the rewrite cases), but where
    the user sees value in being able to follow back the history.

Avoids:

- encoding anything permanently into the repository (which was
   something I was thinking of, and others before me apparently, but
   which I now accept would be an awful idea ;) ).

1. I'm not sure if it should be indexed by (commit ID) or
    (tree1,tree2) tuple. ??

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Men take only their needs into consideration -- never their abilities.
 		-- Napoleon Bonaparte

^ permalink raw reply

* Re: Following renames
From: Marco Costalba @ 2006-03-27  5:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0603261422280.15714@g5.osdl.org>

On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Sun, 26 Mar 2006, Marco Costalba wrote:
> >
> > FIRST WAY
> >
> > After annotating a file history (double click on a file name in
> > bottom-right window or directly from tree view), you see the whole
> > file annotated. If you have the diff window open you see also the
> > corresponding patch (scrolled to selected file name).
>
> The problem is that this step is already _way_ too expensive.
>
> I don't want to work with any tool that makes "Step 1" take a minute or
> two for a project that has a few years of history. Try it on the linux
> historic project with some file that gets lots of modifications.
>

Historic Linux test (63428 revisions)

File: drivers/net/tg3.c
Revisions that modify tg3.c : 292

With qgit
15s to retrieve file history (git-rev-list)
19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one)

and...

$ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null
98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (797major+43033minor)pagefaults 0swaps

NOTE: It seems that  git-whatchanged asks for checked the out file to
work. It didn't work with no repository checked out.


Marco

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox