From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
"Jean-Jacques Lafay" <jeanjacques.lafay@gmail.com>,
"René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
msysgit@googlegroups.com, "Git List" <git@vger.kernel.org>,
"Philip Oakley" <philipoakley@iee.org>
Subject: Re: Re: [PATCH] git tag --contains : avoid stack overflow
Date: Tue, 13 Nov 2012 01:08:38 -0500 [thread overview]
Message-ID: <20121113060838.GD10995@sigill.intra.peff.net> (raw)
In-Reply-To: <7vwqxqf6li.fsf@alter.siamese.dyndns.org>
On Mon, Nov 12, 2012 at 08:51:37PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Yeah. We tolerate a certain amount of skew (24 hours for --name-rev, and
> > 5 broken commits in a row for --since). But the big ones are usually
> > software bugs (the big kernel ones were from broken "guilt", I think) or
> > broken imports (when I published a bunch of skew statistics last year,
> > the interesting ones were all imports; I don't know if they were
> > software bugs, or just garbage in, garbage out).
>
> I was hoping that 2e6bdd3 (test-generation: compute generation
> numbers and clock skews, 2012-09-04) may be a good first step to
> come up with a practical and cheap solution on top of it.
>
> The traversal can be fooled by clock skews when it sees a commit
> that has a timestamp that is older than it should, causing it to
> give up, incorrectly thinking that there won't be newer commits that
> it is interested in behind the problematic commit.
I wrote a similar skew-finding tool last year, though some of the
numbers it came up with were different (I remember having many fewer
skewed commits in the kernel repo).
One problem is that it identifies commits which behave badly with
certain algorithms, but it does not identify commits which are wrong.
If I skew backwards, it will find my commit. But if I skew forwards, it
will label my children as wrong.
> The logic implemented by the change is to identify these problematic
> commits, and we could record these commits with the value of the
> timestamps they should have had (e.g. the timestamp of the newest
> ancestor for each of these commits) in a notes tree. Then the
> traversal logic (commit-list-insert-by-date) could be updated use
> that "corrected" timestamp instead not to be fooled by the clock
> skew.
>
> Such a notes tree can be built once and updated by only "appending",
> as a commit will never acquire more ancestors in its parents chain
> once it is made.
>
> Is it too simplistic, or too costly? In git.git we have three such
> commits whose timestamp need to be corrected, while in the Linux
> kernel there were 2.2k skewed commits when I counted them a few
> months ago.
This came up in the big generations discussion last summer, and I think
I even implemented a proof of concept. I couldn't find the actual code,
though but only that I got "pleasing performance results using a notes
tree to store a list of commits with bogus timestamps":
http://article.gmane.org/gmane.comp.version-control.git/161101
It is a little wasteful in space if you have a lot of skewed commits
(the notes tree stores a 160-bit hash pointing to a blob object storing
a 32-bit integer).
My personal preference at this point would be:
1. introduce an auxiliary metadata file that would live alongside the
pack index and contain generation numbers
2. generate the metadata file during pack indexing.
3. If we have a generation metadata file, but a particular object is
not in it, compute the generation; this should be quick because we
will hit a file with a stored generation eventually
4. If we do not have any generation metadata files, or if grafts or
replace objects are in use, do not use cutoffs in algorithms. Be
safe but slow.
On the other hand, just switching to doing a single traversal instead of
one merge-base computation per tag already got rid of the really awful
performance cases. Nobody has complained since that went in, so maybe
nobody cares about shaving a few seconds per operation down to a few
tens of milliseconds. The real win was shaving tens of seconds down to a
few seconds.
-Peff
--
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.
You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en
prev parent reply other threads:[~2012-11-13 6:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1352568970-4669-1-git-send-email-jeanjacques.lafay@gmail.com>
2012-11-10 20:00 ` [PATCH] git tag --contains : avoid stack overflow Philip Oakley
2012-11-10 21:13 ` Jean-Jacques Lafay
2012-11-10 21:33 ` Pat Thoyts
2012-11-11 16:46 ` René Scharfe
2012-11-11 16:54 ` Jeff King
2012-11-11 23:10 ` Johannes Schindelin
2012-11-12 22:27 ` Jean-Jacques Lafay
2012-11-12 23:14 ` Jeff King
2012-11-13 1:16 ` Johannes Schindelin
2012-11-13 3:46 ` [msysGit] " Jeff King
2012-11-13 4:01 ` Johannes Schindelin
2012-11-13 4:05 ` Jeff King
2012-11-13 4:52 ` Johannes Schindelin
2012-11-13 4:51 ` Junio C Hamano
2012-11-13 6:08 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121113060838.GD10995@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeanjacques.lafay@gmail.com \
--cc=msysgit@googlegroups.com \
--cc=philipoakley@iee.org \
--cc=rene.scharfe@lsrfire.ath.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).