* git describe --contains doesn't work properly for a commit
@ 2015-02-26 13:35 Michal Hocko
2015-02-26 14:23 ` Michal Hocko
0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2015-02-26 13:35 UTC (permalink / raw)
To: git
Hi,
I have just encountered an old kernel git commit:
commit c854363e80b49dd04a4de18ebc379eb8c8806674
Author: Dave Chinner <david@fromorbit.com>
Date: Sat Feb 6 12:39:36 2010 +1100
xfs: Use delayed write for inodes rather than async V2
[...]
which cannot be described properly:
$ git describe --contains c854363e80b49dd04a4de18ebc379eb8c8806674
fatal: cannot describe 'c854363e80b49dd04a4de18ebc379eb8c8806674'
but it seems to find a tag on which the commit is based:
$ git describe c854363e80b49dd04a4de18ebc379eb8c8806674
v2.6.33-rc4-49-gc854363e80b4
if I follow parents
sha=c854363e80b49dd04a4de18ebc379eb8c8806674;
while true
do
parent=$(git show --format=%P $sha | head -1)
echo $sha $parent
git describe --contains $parent && break
sha=$parent
done
c854363e80b49dd04a4de18ebc379eb8c8806674 777df5afdb26c71634edd60582be620ff94e87a0
fatal: cannot describe '777df5afdb26c71634edd60582be620ff94e87a0'
777df5afdb26c71634edd60582be620ff94e87a0 d5db0f97fbbeff11c88dec1aaf1536a975afbaeb
fatal: cannot describe 'd5db0f97fbbeff11c88dec1aaf1536a975afbaeb'
d5db0f97fbbeff11c88dec1aaf1536a975afbaeb 388f1f0c346b533b06d8bc792f7204ebc3e4b7da
v2.6.34-rc1~278^2~14
I am using:
$ git --version
git version 2.1.4
but the same seems to be the case with older git version (1.8.5.6).
$ git rev-list c854363e80b49dd04a4de18ebc379eb8c8806674..v2.6.34 | wc -l
11648
So there seems to be a line between the two commits AFAIU.
Is the history somehow broken or is it a bug in git?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: git describe --contains doesn't work properly for a commit 2015-02-26 13:35 git describe --contains doesn't work properly for a commit Michal Hocko @ 2015-02-26 14:23 ` Michal Hocko 2015-03-04 10:54 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2015-02-26 14:23 UTC (permalink / raw) To: git On Thu 26-02-15 14:35:34, Michal Hocko wrote: > Hi, > I have just encountered an old kernel git commit: > commit c854363e80b49dd04a4de18ebc379eb8c8806674 > Author: Dave Chinner <david@fromorbit.com> > Date: Sat Feb 6 12:39:36 2010 +1100 > > xfs: Use delayed write for inodes rather than async V2 > [...] OK, I've managed to recreate this in a simple repo with 3 commits: $ git log --format="%H %cd" ab0efec2b697f2f9f864bb0e2cd77308d1f04561 Thu Feb 26 15:18:36 2015 +0100 d63972e4e4e7eda0444e56739ad09bfbc476b9bd Wed Feb 26 15:18:30 2014 +0100 108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3 Thu Feb 26 14:57:29 2015 +0100 The commit in the middle was ammended to have committer date in the past. $ git describe --contains d63972e4e4e7eda0444e56739ad09bfbc476b9bd tag~1 but $ git describe --contains 108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3 fatal: cannot describe '108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3' I guess this is the same issue reported previously here: http://git.661346.n2.nabble.com/git-describe-contains-fails-on-given-tree-td5448286.html Can this be fixed somehow or it would lead to other kind of issues? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git describe --contains doesn't work properly for a commit 2015-02-26 14:23 ` Michal Hocko @ 2015-03-04 10:54 ` Jeff King 2015-03-04 15:06 ` Michael J Gruber 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2015-03-04 10:54 UTC (permalink / raw) To: Michal Hocko; +Cc: git On Thu, Feb 26, 2015 at 03:23:14PM +0100, Michal Hocko wrote: > The commit in the middle was ammended to have committer date in the > past. > $ git describe --contains d63972e4e4e7eda0444e56739ad09bfbc476b9bd > tag~1 > > but > $ git describe --contains 108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3 > fatal: cannot describe '108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3' > > I guess this is the same issue reported previously here: > http://git.661346.n2.nabble.com/git-describe-contains-fails-on-given-tree-td5448286.html Yes, the "describe --contains" algorithm uses timestamps to cut off the traversal, so it can do the wrong thing if there's clock skew. It has a "slop" margin of one day, but skew larger than that can fool it. > Can this be fixed somehow or it would lead to other kind of issues? The options are basically: 1. Stop cutting off the traversal based on timestamps. This will make the common case of valid timestamps much slower, though, as it will have to walk all the way to the roots. 2. Use a different slop mechanism. For example, keep walking up to 5 commits past a commit suspected to be past the cutoff. This is relatively easy to do (we do it for "--since" checks), and would catch your case above. But of course it does not catch all cases of skew. 3. Introduce a more trust-worthy mechanism for ordering commits. The timestamp here is really just a proxy for the oft-discussed "generation number" of the commit within the graph. We've avoided adding generation numbers because of the storage/complexity issues. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git describe --contains doesn't work properly for a commit 2015-03-04 10:54 ` Jeff King @ 2015-03-04 15:06 ` Michael J Gruber 2015-03-04 18:05 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Michael J Gruber @ 2015-03-04 15:06 UTC (permalink / raw) To: Jeff King, Michal Hocko; +Cc: git Jeff King venit, vidit, dixit 04.03.2015 11:54: > On Thu, Feb 26, 2015 at 03:23:14PM +0100, Michal Hocko wrote: > >> The commit in the middle was ammended to have committer date in the >> past. >> $ git describe --contains d63972e4e4e7eda0444e56739ad09bfbc476b9bd >> tag~1 >> >> but >> $ git describe --contains 108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3 >> fatal: cannot describe '108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3' >> >> I guess this is the same issue reported previously here: >> http://git.661346.n2.nabble.com/git-describe-contains-fails-on-given-tree-td5448286.html > > Yes, the "describe --contains" algorithm uses timestamps to cut off the > traversal, so it can do the wrong thing if there's clock skew. It has a > "slop" margin of one day, but skew larger than that can fool it. > >> Can this be fixed somehow or it would lead to other kind of issues? > > The options are basically: > > 1. Stop cutting off the traversal based on timestamps. This will make > the common case of valid timestamps much slower, though, as it will > have to walk all the way to the roots. > > 2. Use a different slop mechanism. For example, keep walking up to 5 > commits past a commit suspected to be past the cutoff. This is > relatively easy to do (we do it for "--since" checks), and would > catch your case above. But of course it does not catch all cases of > skew. > > 3. Introduce a more trust-worthy mechanism for ordering commits. The > timestamp here is really just a proxy for the oft-discussed > "generation number" of the commit within the graph. We've avoided > adding generation numbers because of the storage/complexity issues. Hmmh. Storage: one int (or maybe less) per commit doesn't sound too bad. We can probably do without on bare repos by default. Complexity: Was that due to replace refs? Other than that, it seemed to be simple: max(parent generation numbers)+1. ... or can reachability bitmaps help??? Michael ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git describe --contains doesn't work properly for a commit 2015-03-04 15:06 ` Michael J Gruber @ 2015-03-04 18:05 ` Jeff King 2015-03-04 20:41 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2015-03-04 18:05 UTC (permalink / raw) To: Michael J Gruber; +Cc: Michal Hocko, git On Wed, Mar 04, 2015 at 04:06:17PM +0100, Michael J Gruber wrote: > > 3. Introduce a more trust-worthy mechanism for ordering commits. The > > timestamp here is really just a proxy for the oft-discussed > > "generation number" of the commit within the graph. We've avoided > > adding generation numbers because of the storage/complexity issues. > > Hmmh. > > Storage: one int (or maybe less) per commit doesn't sound too bad. We > can probably do without on bare repos by default. > > Complexity: Was that due to replace refs? Other than that, it seemed to > be simple: max(parent generation numbers)+1. Calculating them is simple. Caching and storage is the bigger question. When we do we generate them? Where do we store them? What do we do with replace-refs and grafts? I think the answers are "at repack time", "in an auxiliary file alongside the pack idx", and "we turn it off completely when these features are in use". See: http://thread.gmane.org/gmane.comp.version-control.git/214916 for a sample implementation. > ... or can reachability bitmaps help??? Sometimes. If you are asking about --contains traversals, then bitmaps can let you stop the traversal early. We have some patches that do this that are running in production at GitHub, but they are kind of gnarly. One of my goals is to clean them up and get them upstream. It's also part of why I didn't pursue the series above further. Making "--contains" faster is one goal, but making "rev-list --objects --all" faster was more important (since we do it for every fetch). And making commits faster is only half the equation there. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git describe --contains doesn't work properly for a commit 2015-03-04 18:05 ` Jeff King @ 2015-03-04 20:41 ` Junio C Hamano 2015-03-04 22:05 ` Mike Hommey 2015-03-05 5:12 ` Jeff King 0 siblings, 2 replies; 9+ messages in thread From: Junio C Hamano @ 2015-03-04 20:41 UTC (permalink / raw) To: Jeff King; +Cc: Michael J Gruber, Michal Hocko, git Jeff King <peff@peff.net> writes: > On Wed, Mar 04, 2015 at 04:06:17PM +0100, Michael J Gruber wrote: > >> Complexity: Was that due to replace refs? Other than that, it seemed to >> be simple: max(parent generation numbers)+1. > > Calculating them is simple. Caching and storage is the bigger question. Yes, also having to handle the ones whose generation numbers haven't been computed yet adds to the complexity. This one, and $gmane/264101, are a few instances of this known issue raised here recently. I have been wondering if we can do something along the following (these are not alternatives) as a cheaper workaround: (1) Introduce '--skewed-timestamps[=(allow|warn|reject)' to all commands that create new commit objects. If the committer timestamp being used is older than any of the parent commits, "warn" or "reject" depending on the setting. Make 'reject' the default for commands that are purely local (i.e. recording your own progress by cherry-picking, committing, rebasing, reverting, etc.) and 'warn' the default for commands that merge other peoples' history that you may lack the power to rewind and correct (e.g. 'pull' and 'merge' from remote tracking refs). (2) Compute a bitmap whose timestamps are suspect when we pack to mark commits. When revision.c:limit_list() tries to see if there still are interesting commits, an UNINTERESTING commit marked as such shouldn't be counted as "not interesting because it is old enough". Use the same hint in the walker used in "describe --contains". ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git describe --contains doesn't work properly for a commit 2015-03-04 20:41 ` Junio C Hamano @ 2015-03-04 22:05 ` Mike Hommey 2015-03-05 5:12 ` Jeff King 1 sibling, 0 replies; 9+ messages in thread From: Mike Hommey @ 2015-03-04 22:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Michael J Gruber, Michal Hocko, git On Wed, Mar 04, 2015 at 12:41:57PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Wed, Mar 04, 2015 at 04:06:17PM +0100, Michael J Gruber wrote: > > > >> Complexity: Was that due to replace refs? Other than that, it seemed to > >> be simple: max(parent generation numbers)+1. > > > > Calculating them is simple. Caching and storage is the bigger question. > > Yes, also having to handle the ones whose generation numbers haven't > been computed yet adds to the complexity. > > This one, and $gmane/264101, are a few instances of this known issue > raised here recently. I have been wondering if we can do something > along the following (these are not alternatives) as a cheaper > workaround: > > (1) Introduce '--skewed-timestamps[=(allow|warn|reject)' to all > commands that create new commit objects. If the committer > timestamp being used is older than any of the parent commits, > "warn" or "reject" depending on the setting. > > Make 'reject' the default for commands that are purely local > (i.e. recording your own progress by cherry-picking, > committing, rebasing, reverting, etc.) and 'warn' the default > for commands that merge other peoples' history that you may > lack the power to rewind and correct (e.g. 'pull' and 'merge' > from remote tracking refs). Please note that a common cause (for me, at least) of skewed timestamps is importing from a foreign VCS. Mike ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git describe --contains doesn't work properly for a commit 2015-03-04 20:41 ` Junio C Hamano 2015-03-04 22:05 ` Mike Hommey @ 2015-03-05 5:12 ` Jeff King 2015-03-05 6:00 ` Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: Jeff King @ 2015-03-05 5:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael J Gruber, Michal Hocko, git On Wed, Mar 04, 2015 at 12:41:57PM -0800, Junio C Hamano wrote: > > Calculating them is simple. Caching and storage is the bigger question. > > Yes, also having to handle the ones whose generation numbers haven't > been computed yet adds to the complexity. I'm not sure it's that bad. If you cache generation numbers for all known commits when you repack, then worst case you have to traverse all commits not in the pack. > This one, and $gmane/264101, are a few instances of this known issue > raised here recently. If $gmane/264101 is caused by clock skew, I'd find that disturbing. Those algorithms are supposed to be "correct, but slower" in the face of skew, not ever incorrect. > I have been wondering if we can do something > along the following (these are not alternatives) as a cheaper > workaround: > > (1) Introduce '--skewed-timestamps[=(allow|warn|reject)' to all > commands that create new commit objects. If the committer > timestamp being used is older than any of the parent commits, > "warn" or "reject" depending on the setting. I think this idea has come up before. If it's _your_ timestamp that is screwed up, this detects it, which is good. But if it's somebody else's timestamp that is screwed up, there's often not much you can do. It's already baked into the history. I don't mind it as an extra layer of protection, I guess. But my recollection of the great skew survey[1] is that most of these problems don't come from actual clock skew, but from software bugs or bogus data in imported commits. True skew is generally less than a day, and can be handled with a fixed slop time. [1] http://article.gmane.org/gmane.comp.version-control.git/159065 > (2) Compute a bitmap whose timestamps are suspect when we pack to > mark commits. When revision.c:limit_list() tries to see if > there still are interesting commits, an UNINTERESTING commit > marked as such shouldn't be counted as "not interesting because > it is old enough". Use the same hint in the walker used in > "describe --contains". If you see mismatched timestamps between a parent and child commit, it's often not clear which one is suspicious. Is the parent skewed to the future, or is the child skewed to the past? Which one do you mark as suspect? IMHO, if you are going to go to the trouble to detect and store skew, you should just go to the trouble to calculate and store reliable generation numbers. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git describe --contains doesn't work properly for a commit 2015-03-05 5:12 ` Jeff King @ 2015-03-05 6:00 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2015-03-05 6:00 UTC (permalink / raw) To: Jeff King; +Cc: Michael J Gruber, Michal Hocko, git Jeff King <peff@peff.net> writes: > On Wed, Mar 04, 2015 at 12:41:57PM -0800, Junio C Hamano wrote: > > >> This one, and $gmane/264101, are a few instances of this known issue >> raised here recently. > > If $gmane/264101 is caused by clock skew, I'd find that disturbing. > Those algorithms are supposed to be "correct, but slower" in the face of > skew, not ever incorrect. My understanding is that the commit painting done by merge-base is designed to be always correct, but the A..B revision traversal depends on SLOP being big enough. When the traversal queue is filled with all UNINTERESTING commits, some of which needs to be dug to reveal newer commits that are not yet painted as UNINTERESTING in order to get the complete picture, the still_interesting() logic will end up stopping prematurely, leaving commits that are actually UNINTERESTING in the topological sense unpainted in the resulting newlist that is assigned to revs->commits in limit_list(), yielding an incorrect result. >> > Calculating them is simple. Caching and storage is the bigger question. >> >> Yes, also having to handle the ones whose generation numbers haven't >> been computed yet adds to the complexity. > > I'm not sure it's that bad. If you cache generation numbers for all > known commits when you repack, then worst case you have to traverse all > commits not in the pack. > ... > IMHO, if you are going to go to the trouble to detect and store skew, > you should just go to the trouble to calculate and store reliable > generation numbers. I would actually prefer a solution with generation numbers, of course, because that would give us provably correct result. If it can be done without too much hassle, I am all for it. Scrap the half-baked "I've been wondering" compromise ;-) Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-03-05 6:00 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-26 13:35 git describe --contains doesn't work properly for a commit Michal Hocko 2015-02-26 14:23 ` Michal Hocko 2015-03-04 10:54 ` Jeff King 2015-03-04 15:06 ` Michael J Gruber 2015-03-04 18:05 ` Jeff King 2015-03-04 20:41 ` Junio C Hamano 2015-03-04 22:05 ` Mike Hommey 2015-03-05 5:12 ` Jeff King 2015-03-05 6:00 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox