Git development
 help / color / mirror / Atom feed
* Re: Rename detection at git log
From: Junio C Hamano @ 2006-11-20 11:28 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <7virha4cnm.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> There are a few things we need to be careful about rename/copy.
>...
>  - Copies are only picked up from files that were changed in the
>    same change (i.e. splitting major part of original file and
>    moving it to somewhere else, while leaving a skelton in the
>    original file).  "harder" is needed if the copy original was
>    untouched, as you found out.
>
> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.

If people are well disciplined, code refactoring (which can
trigger rename/copy detection) tend to affect both source and
destination files at the same time, so many times -C finds what
you want without --find-copies-harder.

But sometimes the source stays the same and you literally have
duplicate (with possibly some modifications) in the new
destination.  Finding exact copy is cheap (diffcore-rename has a
double loop that first finds exact copies without similarity
estimation which is very cheap, and then goes on to open blobs
and does its similarity magic for destinations whose origin is
still unknown) but copy/rename with edit is not, and "harder"
variant feeds _everything_ from the older tree as a candidate of
copy source, so it is very expensive for huge projects.

> In the kernel archive, 
>
> 	git show -C ad2f931d
>
> tells us that:
>
>  - drivers/i2c/chips/Kconfig lost major part of it and only
>    skeletal part of the original remains in it;
>
>  - major part of it went to drivers/hwmon/Kconfig;
>
> The story is similar to the Makefile next door.

Having said all that, I think the rename/copy as a wholesale
operation on one file is an uninteresting special case.  The
generic case that happens far more often in practice is the
lines moving around across files, and the new "git blame" gives
you better picture to answer "where the heck did this come from"
question.

For example,

	git blame -f -n -C 'ad2f931d^!' -- drivers/hwmon/Kconfig

on the same commit would show that many of its lines came from
i2c/chips/Kconfig but not all of them.

There are quite a few other things I should probably mention for
new people on the list about rename/copy/break heuristics but it
is getting late so I'd defer it to some other time.

^ permalink raw reply

* Re: Rename detection at git log
From: Andy Parkins @ 2006-11-20 11:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <7vd57i4cij.fsf@assigned-by-dhcp.cox.net>

On Monday 2006 November 20 10:51, Junio C Hamano wrote:

> I think Alex (Riesen) is saying "you (Alex Litvinov) were
> wondering why you do not see the commit log message but only the
> first line. That is because you are using --pretty=oneline.
> Lose it, then you would get what you want because giving the log
> message _is_ the default".

You're right.  Apologies to Alex for my misunderstanding.

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply

* Re: Rename detection at git log
From: Jakub Narebski @ 2006-11-20 11:15 UTC (permalink / raw)
  To: git
In-Reply-To: <200611201101.04456.andyparkins@gmail.com>

Andy Parkins wrote:

> On Monday 2006 November 20 10:48, Junio C Hamano wrote:
>
>>  - Copies are only picked up from files that were changed in the
>>    same change (i.e. splitting major part of original file and
>>    moving it to somewhere else, while leaving a skelton in the
>>    original file).  "harder" is needed if the copy original was
>>    untouched, as you found out.
> 
> Yep; I understand that.  I also understand that it is done for performance 
> reasons.  However, since the typical copy will be one where the source 
> doesn't change at the same time, I am arguing that the non-hard copy 
> detection isn't much use.

I'm not sure about this. You usually both do pure renames (to reorganize
files, to give file a better name) and renames with modification, but
I don't think that copy without modification is very common. Usually you
copy a file because you take one file as template for the other, or you
split file, or you join files into one file.
 
>> The last one is a compromise between performance and thoroughness,
>> and the "harder" is one knob to tweak its behaviour.
> 
> I've been poking in tree-diff.c to see if I can understand why it it such a 
> performance hog.  I still haven't.  Each file is stored under its hash right?  
> So for copy detection why can't you just search for other files with the same 
> hash, which I presume is very fast (as it is the basis of what makes git so 
> fast)?

Copy and rename detection are done by comparing the contents, calculating
similarity. So to check if files A and B are copies (not necessary pure
copies) it is not enough to compare hashes.

That said, it should be fairly easy (if not that useful in true projects
as I understand it, as stated above) to add to copy detection detection of
pure copies by comparing hashes. Still, --find-copies-harder would be still
needed if the copy original was untouched, while copy itself was modified.

> I am probably misunderstanding git, but I guess that a copy isn't even needed 
> in the database because two files with the same hash in the working copy only 
> need storing once and then referencing twice.  So for a copy (again, with my 
> simple understanding of git) we'd have:
> 
>  commit1 -> tree1 -> fileA = fileA_hash
>     ^
>     |
>  commit2 -> tree2 -> fileA = fileA_hash
>                      fileB = fileB_hash
> 
> Doesn't that mean that copy detection is just a matter of searching the parent 
> commit trees for references to the same hash?

Think copy'n'change.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: Rename detection at git log
From: Andy Parkins @ 2006-11-20 11:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <7virha4cnm.fsf@assigned-by-dhcp.cox.net>

On Monday 2006 November 20 10:48, Junio C Hamano wrote:

> I wrote the code and you contradict me ;-)?

Sorry; I wasn't so much contradicting that the filtering works exactly as you 
say (of course it must - I don't know anywhere near enough to make that sort 
of assertion).

However, I do think that the problem is not one of filtering.  I was saying 
that "-C" has no practical use.

> in your example, it would give you the creation of fileB, not
> copy.

I'm sure it would - but you had to use --find-copies-harder; -C would not find 
it as a copy.

>  - Renames are only picked up from files that were lost in the
>    same change (i.e. "mv fileA fileB" creates fileB and loses
>    fileA; fileB is checked if it is similar to fileA in the
>    original).

I've found rename detection to be flawless in all my uses.

>  - Copies are only picked up from files that were changed in the
>    same change (i.e. splitting major part of original file and
>    moving it to somewhere else, while leaving a skelton in the
>    original file).  "harder" is needed if the copy original was
>    untouched, as you found out.

Yep; I understand that.  I also understand that it is done for performance 
reasons.  However, since the typical copy will be one where the source 
doesn't change at the same time, I am arguing that the non-hard copy 
detection isn't much use.

> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.

I've been poking in tree-diff.c to see if I can understand why it it such a 
performance hog.  I still haven't.  Each file is stored under its hash right?  
So for copy detection why can't you just search for other files with the same 
hash, which I presume is very fast (as it is the basis of what makes git so 
fast)?

I am probably misunderstanding git, but I guess that a copy isn't even needed 
in the database because two files with the same hash in the working copy only 
need storing once and then referencing twice.  So for a copy (again, with my 
simple understanding of git) we'd have:

 commit1 -> tree1 -> fileA = fileA_hash
    ^
    |
 commit2 -> tree2 -> fileA = fileA_hash
                     fileB = fileB_hash

Doesn't that mean that copy detection is just a matter of searching the parent 
commit trees for references to the same hash?


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply

* Re: Making literal "20" symbolic
From: Junio C Hamano @ 2006-11-20 10:54 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200611201049.41024.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> I notice that there are a lot of uses of the literal 20 throughout git; I'd 
> like to change them (as appropriate) to HASH_WIDTH, or similar; and maybe 
> HASH_WIDTH_ASCII for the 40s.
>
> Is there a particular header file that is appropriate to put 
>
> #define HASH_WIDTH 20
> #define HASH_WIDTH_ASCII (HASH_WIDTH*2)
>
> Of course, I plan to review each instance to make sure I'm not changing a 
> non-hash width 20.

Probably in cache.h, close to where it defines is_null_sha1(),
hashcmp(), and friends, is the right place.

There are few places that say 42 (because we have 40-hex at the
beginning of line, followed by a single whitespace and then
something should follow so line length must be at least 42
chars), so hunting them all would be a lot of work, but I do
think this is a worthwhile cleanup.

^ permalink raw reply

* Re: Rename detection at git log
From: Junio C Hamano @ 2006-11-20 10:51 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200611201023.54146.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> On Monday 2006 November 20 10:06, Alex Riesen wrote:
>
>> remove --pretty=oneline, it is default behavior of git log.
>
> No it's not; are you confusing it with --pretty=short?

I think Alex (Riesen) is saying "you (Alex Litvinov) were
wondering why you do not see the commit log message but only the
first line. That is because you are using --pretty=oneline.
Lose it, then you would get what you want because giving the log
message _is_ the default".

^ permalink raw reply

* Making literal "20" symbolic
From: Andy Parkins @ 2006-11-20 10:49 UTC (permalink / raw)
  To: git

Hello,

I notice that there are a lot of uses of the literal 20 throughout git; I'd 
like to change them (as appropriate) to HASH_WIDTH, or similar; and maybe 
HASH_WIDTH_ASCII for the 40s.

Is there a particular header file that is appropriate to put 

#define HASH_WIDTH 20
#define HASH_WIDTH_ASCII (HASH_WIDTH*2)

Of course, I plan to review each instance to make sure I'm not changing a 
non-hash width 20.


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply

* Re: Rename detection at git log
From: Junio C Hamano @ 2006-11-20 10:48 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <200611201022.10656.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> On Monday 2006 November 20 10:07, Junio C Hamano wrote:
>
>> The real issue here is because the b/a on the command line
>> applies on the input-side, and does not act as the output
>> filter.  This comes from _very_ early design decision and if you
>> dig the list archive you will see Linus and I arguing about
>> diffcore-pathspec (which later died off).
>
> I don't think so; even without the b/a on the command line,
> git does not find copies made in this way...

I wrote the code and you contradict me ;-)?

Trust me, I know this area reasonably well, to the point that
sometimes I wonder if there is a sane and cheap way to change
the meaning of the pathspec to be an output filter and then
quickly say "Nah" to myself.

If you say

	git diff --find-copies-harder HEAD^..HEAD -- fileB

in your example, it would give you the creation of fileB, not
copy.

There are a few things we need to be careful about rename/copy.

 - Typically too small files are not treated as copies unless
   they are identical copies (does not apply to this case,
   luckily).

 - Renames are only picked up from files that were lost in the
   same change (i.e. "mv fileA fileB" creates fileB and loses
   fileA; fileB is checked if it is similar to fileA in the
   original).

 - Copies are only picked up from files that were changed in the
   same change (i.e. splitting major part of original file and
   moving it to somewhere else, while leaving a skelton in the
   original file).  "harder" is needed if the copy original was
   untouched, as you found out.

The last one is a compromise between performance and thoroughness,
and the "harder" is one knob to tweak its behaviour.

In the kernel archive, 

	git show -C ad2f931d

tells us that:

 - drivers/i2c/chips/Kconfig lost major part of it and only
   skeletal part of the original remains in it;

 - major part of it went to drivers/hwmon/Kconfig;

The story is similar to the Makefile next door.

^ permalink raw reply

* Re: Rename detection at git log
From: Andy Parkins @ 2006-11-20 10:23 UTC (permalink / raw)
  To: git
In-Reply-To: <81b0412b0611200206q4ded162drdc450715d7f801e0@mail.gmail.com>

On Monday 2006 November 20 10:06, Alex Riesen wrote:

> remove --pretty=oneline, it is default behavior of git log.

No it's not; are you confusing it with --pretty=short?


Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply

* Re: Rename detection at git log
From: Andy Parkins @ 2006-11-20 10:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Alexander Litvinov
In-Reply-To: <7vejry5t4g.fsf@assigned-by-dhcp.cox.net>

On Monday 2006 November 20 10:07, Junio C Hamano wrote:

> The real issue here is because the b/a on the command line
> applies on the input-side, and does not act as the output
> filter.  This comes from _very_ early design decision and if you
> dig the list archive you will see Linus and I arguing about
> diffcore-pathspec (which later died off).

I don't think so; even without the b/a on the command line, git does not find 
copies made in this way...

$ git init-db
defaulting to local storage area
$ date > fileA
$ git add fileA
$ git commit -a -m "fileA"
Committing initial tree 3ef607fd139dd955f868305462d99dfc4cfff70f
$ cp fileA fileB
$ git add fileB
$ git commit -a -m "fileA -> fileB"

Now let's try and get git-diff to notice this was a copy...

$ git diff HEAD^..HEAD | cat
diff --git a/fileB b/fileB
new file mode 100644
index 0000000..ec620df
--- /dev/null
+++ b/fileB
@@ -0,0 +1 @@
+Mon Nov 20 10:16:29 GMT 2006
$ git diff -C HEAD^..HEAD | cat
diff --git a/fileB b/fileB
new file mode 100644
index 0000000..ec620df
--- /dev/null
+++ b/fileB
@@ -0,0 +1 @@
+Mon Nov 20 10:16:29 GMT 2006
$ git diff --find-copies-harder HEAD^..HEAD | cat
diff --git a/fileA b/fileB
similarity index 100%
copy from fileA
copy to fileB

As I said - I don't see what "-C" ever does for you in all but the rarest of 
uses.  --find-copies-harder is the only way to list copies successfully.  
It's nothing to do with any input or output filtering.



Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply related

* [PATCH] git-merge: make it usable as the first class UI
From: Junio C Hamano @ 2006-11-20 10:17 UTC (permalink / raw)
  To: git; +Cc: linux
In-Reply-To: <7v8xi67qhq.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> So if we rename the current "git merge" to "git-merge--record"
> (or any name "git pull" uses internally to record the merge
> commit), and make "git merge" a synonym to "git pull .", and
> give a command line option -m to "git pull" to _affect_ the
> resulting merge message, I would think everybody would become
> quite happy.  It means:
>
>  - People can say "git merge this-branch" (which is internally
>    translated to "git pull . this-branch");
>
>  - People can say "git pull -m 'I am doing this merge for such
>    and such reason' $URL $branch" to _include_ that message in
>    the resulting merge commit;
>
>  - The same can be said about "git merge -m 'comment' $branch".
>
> I said _affect_ and _include_ in the above because I suspect
> that most of the time you do not want to _replace_ the
> autogenerated part ("Merge branch of repo", and if you are
> pulling from your subordinate trees the merge summary message as
> well).

I did a moral equivalent of the above without renaming the
command "git merge" and will be pushing the result out in "pu"
shortly.

The following is for commenting only -- it depends on an earlier
patch in "pu".

-- >8 --
[PATCH] git-merge: make it usable as the first class UI

This teaches the oft-requested syntax

	git merge $commit

to implement merging the named commit to the current branch.
This hopefully would make "git merge" usable as the first class
UI instead of being a mere backend for "git pull".

Most notably, $commit above can be any committish, so you can
say for example:

	git merge js/shortlog~2

to merge early part of a topic branch without merging the rest
of it.

A custom merge message can be given with the new --message=<msg>
parameter.  The message is prepended in front of the usual
"Merge ..." message autogenerated with fmt-merge-message.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 Documentation/git-merge.txt |   18 +++++++++---
 git-merge.sh                |   61 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 67 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
index bebf30a..e2954aa 100644
--- a/Documentation/git-merge.txt
+++ b/Documentation/git-merge.txt
@@ -8,12 +8,14 @@ git-merge - Grand Unified Merge Driver
 
 SYNOPSIS
 --------
-'git-merge' [-n] [--no-commit] [-s <strategy>]... <msg> <head> <remote> <remote>...
-
+[verse]
+'git-merge' [-n] [--no-commit] [--squash] [-s <strategy>]...
+	[--reflog-action=<action>]
+	-m=<msg> <remote> <remote>...
 
 DESCRIPTION
 -----------
-This is the top-level user interface to the merge machinery
+This is the top-level interface to the merge machinery
 which drives multiple merge strategy scripts.
 
 
@@ -27,13 +29,19 @@ include::merge-options.txt[]
 	to give a good default for automated `git-merge` invocations.
 
 <head>::
-	our branch head commit.
+	Our branch head commit.  This has to be `HEAD`, so new
+	syntax does not require it
 
 <remote>::
-	other branch head merged into our branch.  You need at
+	Other branch head merged into our branch.  You need at
 	least one <remote>.  Specifying more than one <remote>
 	obviously means you are trying an Octopus.
 
+--reflog-action=<action>::
+	This is used internally when `git-pull` calls this command
+	to record that the merge was created by `pull` command
+	in the `ref-log` entry that results from the merge.
+
 include::merge-strategies.txt[]
 
 
diff --git a/git-merge.sh b/git-merge.sh
index 84c3acf..25deb1e 100755
--- a/git-merge.sh
+++ b/git-merge.sh
@@ -3,7 +3,8 @@
 # Copyright (c) 2005 Junio C Hamano
 #
 
-USAGE='[-n] [--no-commit] [--squash] [-s <strategy>]... <merge-message> <head> <remote>+'
+USAGE='[-n] [--no-commit] [--squash] [-s <strategy>] [--reflog-action=<action>] [-m=<merge-message>] <commit>+'
+
 . git-sh-setup
 
 LF='
@@ -92,7 +93,7 @@ finish () {
 
 case "$#" in 0) usage ;; esac
 
-rloga=
+rloga= have_message=
 while case "$#" in 0) break ;; esac
 do
 	case "$1" in
@@ -125,17 +126,63 @@ do
 	--reflog-action=*)
 		rloga=`expr "z$1" : 'z-[^=]*=\(.*\)'`
 		;;
+	-m=*|--m=*|--me=*|--mes=*|--mess=*|--messa=*|--messag=*|--message=*)
+		merge_msg=`expr "z$1" : 'z-[^=]*=\(.*\)'`
+		have_message=t
+		;;
+	-m|--m|--me|--mes|--mess|--messa|--messag|--message)
+		shift
+		case "$#" in
+		1)	usage ;;
+		esac
+		merge_msg="$1"
+		have_message=t
+		;;
 	-*)	usage ;;
 	*)	break ;;
 	esac
 	shift
 done
 
-merge_msg="$1"
-shift
-head_arg="$1"
-head=$(git-rev-parse --verify "$1"^0) || usage
-shift
+# This could be traditional "merge <msg> HEAD <commit>..."  and the
+# way we can tell it is to see if the second token is HEAD, but some
+# people might have misused the interface and used a committish that
+# is the same as HEAD there instead.  Traditional format never would
+# have "-m" so it is an additional safety measure to check for it.
+
+if test -z "$have_message" &&
+	second_token=$(git-rev-parse --verify "$2^0" 2>/dev/null) &&
+	head_commit=$(git-rev-parse --verify "HEAD" 2>/dev/null) &&
+	test "$second_token" = "$head_commit"
+then
+	merge_msg="$1"
+	shift
+	head_arg="$1"
+	shift
+else
+	# We are invoked directly as the first-class UI.
+	head_arg=HEAD
+
+	# All the rest are the commits being merged; prepare
+	# the standard merge summary message to be appended to
+	# the given message.  If remote is invalid we will die
+	# later in the common codepath so we discard the error
+	# in this loop.
+	merge_name=$(for remote
+		do
+			rh=$(git-rev-parse --verify "$remote"^0 2>/dev/null)
+			if git show-ref -q --verify "refs/heads/$remote"
+			then
+				what=branch
+			else
+				what=commit
+			fi
+			echo "$rh		$what '$remote'"
+		done | git-fmt-merge-msg
+	)
+	merge_msg="${merge_msg:+$merge_msg$LF$LF}$merge_name"
+fi
+head=$(git-rev-parse --verify "$head_arg"^0) || usage
 
 # All the rest are remote heads
 test "$#" = 0 && usage ;# we need at least one remote head.
-- 
1.4.4.gbacc


^ permalink raw reply related

* git-diff opens too many files?
From: Nguyen Thai Ngoc Duy @ 2006-11-20 10:12 UTC (permalink / raw)
  To: git

I got this error in a quite big (in files) repository:
error: open("vnexpress.net/Suc-khoe/2001/04/3B9AF976"): Too many open
files in system
fatal: cannot hash vnexpress.net/Suc-khoe/2001/04/3B9AF976

The repository contained about 67.000 files and probably all were modified.
git version 1.4.4.rc1.g2bba
-- 

^ permalink raw reply

* Re: Rename detection at git log
From: Jakub Narebski @ 2006-11-20 10:11 UTC (permalink / raw)
  To: git
In-Reply-To: <7vejry5t4g.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Andy Parkins <andyparkins@gmail.com> writes:
> 
>> On Monday 2006 November 20 05:57, Alexander Litvinov wrote:
>>
>>> > PAGER=cat git log -M -C --pretty=oneline b/a
>>
>> I've come across this too.  Personally I'm not sure what use "-C" is.  From 
>> the manpage, man git-diff-files (no, this isn't the place I'd look either).
> 
> The real issue here is because the b/a on the command line
> applies on the input-side, and does not act as the output
> filter.  This comes from _very_ early design decision and if you
> dig the list archive you will see Linus and I arguing about
> diffcore-pathspec (which later died off).
> 
> What it means is that "git log" will look at path that matches
> b/a (that means b/a/c and b/a/d are looked at, if b/a were a
> directory).  Since path "a" which is what the file was
> originally at is not something the pattern b/a matches, there is
> no way b/a is noticed as a rename from a.
> 
> I've been meaning to resurrect Fredrik's --single-follow=path
> patch but haven't had time to recently, with all the other
> interesting discussion happening on the list.

But for now, you can use

  PAGER= git log -M -C -- b/a a

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: Rename detection at git log
From: Junio C Hamano @ 2006-11-20 10:07 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Alexander Litvinov
In-Reply-To: <200611200951.05529.andyparkins@gmail.com>

Andy Parkins <andyparkins@gmail.com> writes:

> On Monday 2006 November 20 05:57, Alexander Litvinov wrote:
>
>> > PAGER=cat git log -M -C --pretty=oneline b/a
>
> I've come across this too.  Personally I'm not sure what use "-C" is.  From 
> the manpage, man git-diff-files (no, this isn't the place I'd look either).

The real issue here is because the b/a on the command line
applies on the input-side, and does not act as the output
filter.  This comes from _very_ early design decision and if you
dig the list archive you will see Linus and I arguing about
diffcore-pathspec (which later died off).

What it means is that "git log" will look at path that matches
b/a (that means b/a/c and b/a/d are looked at, if b/a were a
directory).  Since path "a" which is what the file was
originally at is not something the pattern b/a matches, there is
no way b/a is noticed as a rename from a.

I've been meaning to resurrect Fredrik's --single-follow=path
patch but haven't had time to recently, with all the other
interesting discussion happening on the list.


^ permalink raw reply

* Re: Rename detection at git log
From: Alex Riesen @ 2006-11-20 10:06 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: git
In-Reply-To: <200611201157.23680.litvinov2004@gmail.com>

On 11/20/06, Alexander Litvinov <litvinov2004@gmail.com> wrote:
> How can I see all changes for one file ? Including renames/copies ?

git log -M -C -r --name-status

> PAGER=cat git log -M -C --pretty=oneline b/a
>
> At lastline I would like to see two commits : renaming a -> b/a and creation
> of a. By the way, how can I see commit message with git log ?


^ permalink raw reply

* Re: Rename detection at git log
From: Andy Parkins @ 2006-11-20  9:50 UTC (permalink / raw)
  To: git; +Cc: Alexander Litvinov
In-Reply-To: <200611201157.23680.litvinov2004@gmail.com>

On Monday 2006 November 20 05:57, Alexander Litvinov wrote:

> > PAGER=cat git log -M -C --pretty=oneline b/a

I've come across this too.  Personally I'm not sure what use "-C" is.  From 
the manpage, man git-diff-files (no, this isn't the place I'd look either).

--find-copies-harder
For performance reasons, by default, -C option finds copies only if the 
original file of the copy was modified in the same changeset. This flag makes 
the command inspect unmodified files as candidates for the source of copy. 
This is a very expensive operation for large projects, so use it with 
caution.

That is to say that unless the file you are copying was modified AND copied in 
the same commit, it won't be searched as a potential source for the copy 
operation.  I think it would be rare to make a copy of a file I had modified, 
surely I'd want to check in modifications before making a copy?

Regardlss, to get the results you want, use the stronger 
switch --find-copies-harder, heeding the warning that on big projects it will 
be very slow.



Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply

* Re: [WISH] Store also tag dereferences in packed-refs
From: Jakub Narebski @ 2006-11-20  9:40 UTC (permalink / raw)
  To: git
In-Reply-To: <7vr6vy7smi.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
>> So I'd suggest adding - at the very top of the ref-pack file - a line line
>>
>>      # Ref-pack version 2
>>
>> which will be ignored by the current ref-pack reader (again, because it's 
>> not a valid ref line), but we can use it in the future to specify further 
>> extensions if we want to.
>>
>> Now somebody would just need to implement that ;)
> 
> For this particular one, there is no need for version 2.

Actually, I think it is both true and untrue. True, because we need some
indicator that we trust packed-refs file to provide tag dereferences to
distinguish between the case when there are no tag objects at all, so there
are no tag dereferences in packed-refs, and the situation where we use
packed-refs generated by older git, and there are no tag dereferences in
packed-refs because git didn't saved it.

Untrue, because it is not enough. In the case[*1*] when packed-refs was
created with tag dereferences, then some "heavyweight" tags were added
by older version of git (adding references doesn't rewrite packed-refs
if I understand correctly), then we use new git again and trust that there
are no derefs...

[*1*] For example when git repository is on the network filesystem, but
programs are installed locally, and perhaps computers in the network are
heterogenic (perhaps even different architectures: PC vs. Sun and/or
different operating systems: Linux vs. FreeBSD vs. Solaris vs.
Windows+Cygwin) and have different versions of git installed (perhaps
one of them is "your" machine, where you have admin rights, and you have
newest git installed there). Or for example using git repository on USB
stick, again on different computers with different version of git installed.
 
---------------------------------------------------------------------

To summarize, we have the following proposals of the packed-refs format
extension


The unusable Linux Torvalds proposal (unusable because of requiring
newer packed-refs work with older git, for example in the case [*1*]
or the case of git downgrade):

lt>    <sha1><space><name>[<space><sha1-of-deref>]*


Linus Torvalds "Now somebody would just need to implement that ;)"
proposal:

lt>    <sha1> refs/tags/tagname
lt>    ^<sha1-unpeeled>
lt>    ^<sha1-unpeeled-of-unpeeled>
lt>    ...
lt>    <sha1> refs/tags/othertag


Junio C Hamano proposal _with code_ (proposal with code usually wins).
Less elegant IMVHO, but perhaps better.

jc> My current wip does:
jc> 
jc>     SHA-1 SP name LF
jc>     SHA-1 SP SP name^{} LF
jc> 
jc> the latter of which is ignored by code in the wild and the new
jc> code can take advantage of (and fall back the usual deref_tag
jc> when it is not available).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: Patch to tutorial.txt
From: Paolo Ciarrocchi @ 2006-11-20  9:34 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <200611201025.11048.jnareb@gmail.com>

On 11/20/06, Jakub Narebski <jnareb@gmail.com> wrote:
> Paolo Ciarrocchi wrote:
> > On 11/20/06, Jakub Narebski <jnareb@gmail.com> wrote:
> >
> >> followed by empty line, then signoff line, for example
> >>
> >>   Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
> >
> > Ok, but the Signed/off-by part should handled by the -s option in
> > git-format-patch.
>
> Signed-off-by _can_ be added by -s option in git-format-patch, but
> I think it is usually better to have it added in the commit, by the -s
> option to git-commit.

Oh, I wan not aware of that as well. Maybe it's worth to be mentioned
in the tutorial.

I'll properly redo the patch later today or tomorrow.

Thanks!
Ciao,
-- 
Paolo
http://docs.google.com/View?docid=dhbdhs7d_4hsxqc8
http://www.linkedin.com/pub/0/132/9a3
Non credo nelle otto del mattino. Però esistono. Le otto del mattino
sono l'incontrovertibile prova della presenza del male nel mondo.

^ permalink raw reply

* Re: Patch to tutorial.txt
From: Jakub Narebski @ 2006-11-20  9:25 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: git
In-Reply-To: <4d8e3fd30611200110y224b5b8dpf974d30d738455c9@mail.gmail.com>

Paolo Ciarrocchi wrote:
> On 11/20/06, Jakub Narebski <jnareb@gmail.com> wrote:
>
>> followed by empty line, then signoff line, for example
>>
>>   Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
> 
> Ok, but the Signed/off-by part should handled by the -s option in
> git-format-patch.

Signed-off-by _can_ be added by -s option in git-format-patch, but 
I think it is usually better to have it added in the commit, by the -s 
option to git-commit.

-- 
Jakub Narebski

^ permalink raw reply

* Re: Patch to tutorial.txt
From: Paolo Ciarrocchi @ 2006-11-20  9:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <200611200949.32722.jnareb@gmail.com>

On 11/20/06, Jakub Narebski <jnareb@gmail.com> wrote:
> On Mon, 20 Nov 2006, Paolo Ciarrocchi wrote:
> > On 11/19/06, Jakub Narebski <jnareb@gmail.com> wrote:
> >> Paolo Ciarrocchi wrote:
>
> >>> From: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
> >>> Date: Sun, 19 Nov 2006 23:41:31 +0100
> >>> Subject: [PATCH] One of the comment was not really clear, rephrased to
> >>> make it easier to be understood by the reader
> >>
> >> Wordwrap. Perhaps it would be better to split description into short line,
> >> and two-line description.
>
> See http://git.or.cz/gitwiki/CommitMessageConventions

Thanks! I was not aware of that.

> In short, it is better to split description into short one-line
> description, for example
>   "Documentation: Make comment about merging in tutorial.txt more clear"
> followed by empty line, then longer description of changes (if any), for
> example
>
>   One of the comment was not really clear, rephrased to make it easier
>   to be understood by the reader
>
> followed by empty line, then signoff line, for example
>
>   Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>

Ok, but the Signed/off-by part should handled  by the -s option in
git-format-patch.

> > This is not clear to me, when I do a "git commit -a" I can add a text using vi,
> > should I manually split the text in multiple lines?
> > Only the first line will be part of the Subject?
>
> Yes. The rest will be in the email body.
>
> >> [...]
> >>>  ------------------------------------------------
> >>>
> >>>  at this point the two branches have diverged, with different changes
> >>> -made in each.  To merge the changes made in the two branches, run
> >>> +made in each.  To merge the changes made in experimental into master run
> >>
> >> I would rather say:
> >>   To merge the changes made in the two branches into master, run
> >
> > Why Jakub? There are only two branches, master and experimental.
> > While sitting in master and doing git pull . experimental I would
> > expect to merge I did in experimental into master. Changes did in
> > master are alreay merged in master. Am I wrong?
>
> For me, "merge" in "to merge the changes" phrase is merge in common-sense
> meaning of the world, not the SCM jargon. Merge the changes == join the
> changes, so you have to give both sides, both changes you join.
>
> Merge the changes == take changes in branch 'experimental' since forking,
> take changes in branch 'master' since forking, join those changes
> together (merge), and put the result of this joining (this merge) into
> branch 'master'.
>
> On the contrary, in "merge branch 'experimenta' into 'master'" phrase
> "merge" is in the SCM meaning of this word.
>
>
> Just my 2 eurocoents of not native English speaker...

I'm not a native English speaker as well, furthemore I'm still not
confident with git so your comments are more then appreciated!

Ciao,
-- 
Paolo
http://docs.google.com/View?docid=dhbdhs7d_4hsxqc8
http://www.linkedin.com/pub/0/132/9a3
Non credo nelle otto del mattino. Però esistono. Le otto del mattino
sono l'incontrovertibile prova della presenza del male nel mondo.

^ permalink raw reply

* Re: Patch to tutorial.txt
From: Jakub Narebski @ 2006-11-20  8:49 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: git
In-Reply-To: <4d8e3fd30611200030p1d117445qd3f7d619c18a0633@mail.gmail.com>

On Mon, 20 Nov 2006, Paolo Ciarrocchi wrote:
> On 11/19/06, Jakub Narebski <jnareb@gmail.com> wrote:
>> Paolo Ciarrocchi wrote:

>>> From: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
>>> Date: Sun, 19 Nov 2006 23:41:31 +0100
>>> Subject: [PATCH] One of the comment was not really clear, rephrased to
>>> make it easier to be understood by the reader
>>
>> Wordwrap. Perhaps it would be better to split description into short line,
>> and two-line description.

See http://git.or.cz/gitwiki/CommitMessageConventions

In short, it is better to split description into short one-line
description, for example
  "Documentation: Make comment about merging in tutorial.txt more clear"
followed by empty line, then longer description of changes (if any), for
example

  One of the comment was not really clear, rephrased to make it easier
  to be understood by the reader

followed by empty line, then signoff line, for example

  Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>

> This is not clear to me, when I do a "git commit -a" I can add a text using vi,
> should I manually split the text in multiple lines?
> Only the first line will be part of the Subject?

Yes. The rest will be in the email body.
 
>> [...]
>>>  ------------------------------------------------
>>>
>>>  at this point the two branches have diverged, with different changes
>>> -made in each.  To merge the changes made in the two branches, run
>>> +made in each.  To merge the changes made in experimental into master run
>>
>> I would rather say:
>>   To merge the changes made in the two branches into master, run
> 
> Why Jakub? There are only two branches, master and experimental.
> While sitting in master and doing git pull . experimental I would
> expect to merge I did in experimental into master. Changes did in
> master are alreay merged in master. Am I wrong?

For me, "merge" in "to merge the changes" phrase is merge in common-sense
meaning of the world, not the SCM jargon. Merge the changes == join the
changes, so you have to give both sides, both changes you join.

Merge the changes == take changes in branch 'experimental' since forking,
take changes in branch 'master' since forking, join those changes
together (merge), and put the result of this joining (this merge) into
branch 'master'.

On the contrary, in "merge branch 'experimenta' into 'master'" phrase
"merge" is in the SCM meaning of this word.


Just my 2 eurocoents of not native English speaker...
-- 
Jakub Narebski

^ permalink raw reply

* Re: [PATCH] Document git-runstatus
From: Jakub Narebski @ 2006-11-20  8:11 UTC (permalink / raw)
  To: git
In-Reply-To: <20061120071529.GF3315@always.joy.eth.net>

Joshua N Pritikin wrote:

> On Sun, Nov 19, 2006 at 07:13:08PM +0100, Petr Baudis wrote:
>> BTW, I've finally found a fine example of situation parallel to Git:
>> TeX!  There are the core TeX commands (plumbing) and plain TeX (basic
>> porcelain) on top of that as well as a bunch of other macro sets (other
>> porcelains). Now I need to dig out The TeXbook from wherever I've put it
>> to see how did Knuth deal with it, documentation-wise.
> 
> Gahh! Please don't use TeX as an example. As far as I know, TeX doesn't 
> offer lexical scope. 

It offers grouping.

> Hence, action-at-a-distance is commonplace which  
> makes program execution extremely difficult for mere mortals to 
> predict. I am constantly amazed at popularity of TeX, in spite of its 
> grave deficiencies. Perhaps there isn't a good alternative yet.

TeX (even plain TeX) is like assembler of programming languages. One does
usually use one of the TeX macros sets, like LaTeX, ConTeXt or texinfo.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: Feature request: git-pull -e/--edit
From: Jakub Narebski @ 2006-11-20  8:02 UTC (permalink / raw)
  To: git
In-Reply-To: <7v8xi67qhq.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> So if we rename the current "git merge" to "git-merge--record"

or git-merge-driver

> (or any name "git pull" uses internally to record the merge
> commit), and make "git merge" a synonym to "git pull .", and
> give a command line option -m to "git pull" to _affect_ the
> resulting merge message, I would think everybody would become
> quite happy.  It means:
> 
>  - People can say "git merge this-branch" (which is internally
>    translated to "git pull . this-branch");
> 
>  - People can say "git pull -m 'I am doing this merge for such
>    and such reason' $URL $branch" to _include_ that message in
>    the resulting merge commit;
> 
>  - The same can be said about "git merge -m 'comment' $branch".
> 
> I said _affect_ and _include_ in the above because I suspect
> that most of the time you do not want to _replace_ the
> autogenerated part ("Merge branch of repo", and if you are
> pulling from your subordinate trees the merge summary message as
> well).

I'm all for adding -m <msg> option to git-pull (and perhaps also common
other message generation options: -F <file>, --edit). I'm even for adding
-m option to git-merge. 

Making "git merge" to be a synonym to "git pull ."... I'm not so sure.
I'd rather we don't lose the ability to give arbitrary refs ar "other"
heads like in

  git merge "Merge early part of branch 'topicA'" HEAD topicA~3

example, and ability (if there is such ability) to not include HEAD (current
version of branch) as first parent like in

  git checkout pu
  git merge "Merge branch 'topicA', 'topicB'" topicA topicB

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: [PATCH] Document git-runstatus
From: Joshua N Pritikin @ 2006-11-20  7:15 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Git Mailing List
In-Reply-To: <20061119181307.GY7201@pasky.or.cz>

On Sun, Nov 19, 2006 at 07:13:08PM +0100, Petr Baudis wrote:
> BTW, I've finally found a fine example of situation parallel to Git:
> TeX!  There are the core TeX commands (plumbing) and plain TeX (basic
> porcelain) on top of that as well as a bunch of other macro sets (other
> porcelains). Now I need to dig out The TeXbook from wherever I've put it
> to see how did Knuth deal with it, documentation-wise.

Gahh! Please don't use TeX as an example. As far as I know, TeX doesn't 
offer lexical scope. Hence, action-at-a-distance is commonplace which 
makes program execution extremely difficult for mere mortals to 
predict. I am constantly amazed at popularity of TeX, in spite of its 
grave deficiencies. Perhaps there isn't a good alternative yet.


^ permalink raw reply

* Rename detection at git log
From: Alexander Litvinov @ 2006-11-20  5:57 UTC (permalink / raw)
  To: git

How can I see all changes for one file ? Including renames/copies ? Currently 
I don't known how to see them :

> mkdir 1 && cd 1 && git init-db
defaulting to local storage area
> date >> a
> git add a
> git commit -a -m "1"
Committing initial tree c47d83a6544612309aad57519ca831cf62a489d5
> mkdir b
> git mv a b/
> git commit -a -m "2"
> PAGER=cat git log -M -C --pretty=oneline b/a
3b591f7147ee8dbe15fdf456db5730072d41bed8 2
>

At lastline I would like to see two commits : renaming a -> b/a and creation 
of a. By the way, how can I see commit message with git log ?

Thanks for help.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox