* Why git-whatchanged shows a commit touching every file, but git-log doesn't?
@ 2013-01-31 19:09 Constantine A. Murenin
2013-01-31 19:34 ` Jonathan Nieder
0 siblings, 1 reply; 2+ messages in thread
From: Constantine A. Murenin @ 2013-01-31 19:09 UTC (permalink / raw)
To: git
Hi,
DragonFly BSD uses git as its SCM, with one single repository and
branch for both the kernel and the whole userland.
On 2011-11-26 (1322296064), someone did a commit that somehow touched
every single file in the repository, even though most of the files
were not modified one bit.
That's the offending commit from 2011-11-26:
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/86d7f5d305c6adaa56ff4582ece9859d73106103
https://github.com/DragonFlyBSD/DragonFlyBSD/commit/86d7f5d305c6adaa56ff4582ece9859d73106103
Since then, with some tools, if you look at file history for any file
anywhere in the repo, you can see that all files were changed on
2011-11-26 with that commit, but it's only shown in some tools, and
not shown in others.
For example, the bogus 2011-11-26 commit is not shown with the following:
* git log sys/sys/sensors.h
* https://github.com/DragonFlyBSD/DragonFlyBSD/commits/master/sys/sys/sensors.h
However, the bogus commit is [erroneously] shown with the following:
% git whatchanged --pretty=%at sys/sys/sensors.h | cat
1322296064
:000000 100644 0000000... 554cfc2... A sys/sys/sensors.h
1191329821
:000000 100644 0000000... 554cfc2... A sys/sys/sensors.h
%
Notice how the file was ]A]dded once again at 1322296064 without ever
being deleted, and that the dst sha1 is the same for both the latest
and the immediately prior revision of the file.
Gitweb, unlike github, would also show the erroneous commit from 2011-11:
http://gitweb.dragonflybsd.org/dragonfly.git/history/HEAD:/sys/sys/sensors.h
Another, more representative example, which shows that src sha1 (field
names are documented in git-diff-tree(1)) is always "0000000..." in
such bogus touch-all commits (even though it makes little sense when
you consider that the files were never deleted and still have the same
dst sha1):
% git whatchanged --pretty=%at sys/sys/sysctl.h | head -9
1322296064
:000000 100644 0000000... 6659977... A sys/sys/sysctl.h
1296826445
:100644 100644 94b8d96... 6659977... M sys/sys/sysctl.h
1292413105
:100644 100644 8c9deaa... 94b8d96... M sys/sys/sysctl.h
%
So, my questions are as follows:
* How was it possible for all these files to be added without first
being deleted in the first place? Was / is it a bug in git (during a
commit) to allow something like that?
* Why do some tools compact such bogus commits out (and hide them from
the user), but some don't?
* Is there a way to make git-whatchanged and gitweb ignore such bogus
commits on files that weren't actually modified, just as git-log and
github already do?
P.S. I've asked this question on
http://stackoverflow.com/q/14632828/1122270, if anyone wants a cookie.
Best regards,
Constantine.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Why git-whatchanged shows a commit touching every file, but git-log doesn't?
2013-01-31 19:09 Why git-whatchanged shows a commit touching every file, but git-log doesn't? Constantine A. Murenin
@ 2013-01-31 19:34 ` Jonathan Nieder
0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Nieder @ 2013-01-31 19:34 UTC (permalink / raw)
To: Constantine A. Murenin; +Cc: git
Hi Constantine,
Constantine A. Murenin wrote:
> DragonFly BSD uses git as its SCM, with one single repository and
> branch for both the kernel and the whole userland.
>
> On 2011-11-26 (1322296064), someone did a commit that somehow touched
> every single file in the repository, even though most of the files
> were not modified one bit.
"gitk --simplify-by-decoration" might provide some insight.
In the dragonfly history, it seems that imports of a packages typically
proceed in two steps:
1. First, the upstream code is imported as a new "initial commit"
with no history:
cd ~/src
git init gcc-4.7.2-import
cd gcc-4.7.2-import
tar -xf /path/to/gcc-4.7.2
mkdir contrib
mv gcc-4.7.2 contrib/gcc-4.7
git add .
git commit -m 'Import gcc-4.7.2 to new vendor branch'
2. Next, that code is incorporated into dragonfly.
cd ~/src/dragonfly
git fetch ../gcc-4.7.2-import master:refs/heads/vendor/GCC47
git merge vendor/GCC47
rm -fr ../gcc-4.7.2-import
Unfortunately in the commit you mentioned, someone made a mistake.
Instead of importing a single new upstream package, the author
imported the entire dragonfly tree as a new vendor branch. Oops.
The effects might be counterintuitive:
* tools like "git blame" and path-limited "git log" get a choice:
when looking at the merge that pulled in a copy of dragonfly into
the existing dragonfly codebase, either parent is an equally
sensible from blame's point of view as an explanation of the origin
of this code. I think both prefer the first parent here, making them
happen to produce the "right" result.
* tools like "git show" that describe what change a commit made
get a choice: when looking at a parentless commit, the diff that
brings a project into existence may or may not be interesting,
depending on the situation.
See
http://thread.gmane.org/gmane.comp.version-control.git/182571/focus=182577
for more about that.
But at its heart, this is just an instance of "lie when creating your
history and history-mining tools will lie back to you." :)
Hoping that clarifies a little,
Jonathan
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-01-31 19:35 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31 19:09 Why git-whatchanged shows a commit touching every file, but git-log doesn't? Constantine A. Murenin
2013-01-31 19:34 ` Jonathan Nieder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).