From: John Keeping <john@keeping.me.uk>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Git Mailing List <git@vger.kernel.org>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree
Date: Sun, 12 May 2013 09:59:34 +0100 [thread overview]
Message-ID: <20130512085934.GG2299@serenity.lan> (raw)
In-Reply-To: <7v1u9cx5pf.fsf@alter.siamese.dyndns.org>
On Sat, May 11, 2013 at 08:00:44PM -0700, Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> > On Sat, May 11, 2013 at 2:49 PM, John Keeping <john@keeping.me.uk> wrote:
> >>
> >> Hmm... I hadn't realised that. Looking a bit closer, it looks like
> >> init_patch_ids sets up its own diffopts so its not affected by the
> >> command line (except for pathspecs which would be easy to check for).
> >> Of course that still means it can be affected by settings in the user's
> >> configuration.
> >
> > .. and in the actual diff algorithm.
>
> As to the "objection" side of the argument, I already said
> essentially the same thing several months ago:
>
> http://thread.gmane.org/gmane.comp.version-control.git/202654/focus=202898
>
> and do not have much to add [*1*].
>
> However.
>
> The use of patch-id in cherry and rebase is to facilitate avoiding
> to replay commits that are obviously identical to the ones you have
> in your history. The cached patch id for an existing old commit may
> differ from a patch id you freshly compute for a new commit you are
> trying to see if it truly new, even though they may represent the
> same change. So we may incorrectly think such a new commit is not
> yet in your history and attempt to replay it.
>
> But it is not a big problem. Either 3-way merge notices that there
> is nothing new, or you get a conflict and have chance to inspect
> what is going on.
It's not a problem here, but false negatives would be annoying if you're
looking at "git log --cherry-mark".
> A conceptually much larger and more problematic issue is that we may
> discard a truly new change that you still need as an old one you
> already have due to a hash collision and discard it. Because the
> hash space of SHA-1 is so large, however, it is not a problem in
> practice, and more importantly, that hash space is just as large as
> the hash space used by Git to reduce a patch to a patch id, the
> filtering done with patch-id in cherry and rebase _already_ have
> that exact problem with or without this additional cache layer. A
> stale cache may make the possibility of lost change due to such a
> hash collision merely twice as likely.
>
> > ... it's a "the patch ID actually ignores a lot of data in order
> > to give the same ID even if lins have been added above it, and the
> > patch is at different line numbers etc".
>
> Yes.
>
> > So maybe it doesn't matter. But at the same time, I really think
> > caching patch ID's should be something people should be aware of is
> > fundamentally wrong, even if it might work.
>
> I do not think it is "caching patch ID" that people should be aware
> of is fundamentally wrong. What is fundamentally wrong, even if it
> might work, is "using patch ID" itself.
>
> > And quite frankly, if you do rebases etc so much that you think patch
> > ID's are so important that they need to be cached, you may be doing
> > odd/wrong things.
>
> And that, too ;-)
I've never noticed a problem with rebases, it's when I use "git log
--cherry master..." to see if patches I've sent to a mailing list have
been picked up.
To take Git as an example (albeit a bad one because "What's Cooking" is
a more useful way to track patch state here), if I compare this patch to
pu I have:
$ git rev-list --left-right --count pu...
234 1
and caching patch IDs takes that from ~0.6s to ~0.1s. When doing that
over several branches consecutively that makes a big difference to the
overall runtime, especially because most of the commits of interest will
be cached during the first one.
next prev parent reply other threads:[~2013-05-12 8:59 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-11 19:54 [PATCH] patch-ids.c: cache patch IDs in a notes tree John Keeping
2013-05-11 21:10 ` Linus Torvalds
2013-05-11 21:49 ` John Keeping
2013-05-11 22:41 ` Linus Torvalds
2013-05-11 23:57 ` Johannes Schindelin
2013-05-12 9:08 ` John Keeping
2013-05-12 11:41 ` [RFC/PATCH v2] patch-ids: " John Keeping
2013-05-12 11:57 ` John Keeping
2013-05-12 3:00 ` [PATCH] patch-ids.c: " Junio C Hamano
2013-05-12 8:59 ` John Keeping [this message]
2013-05-12 22:19 ` Junio C Hamano
2013-05-13 7:59 ` John Keeping
2013-05-13 13:53 ` Junio C Hamano
2013-05-13 14:02 ` John Keeping
2013-05-13 14:46 ` Junio C Hamano
2013-05-13 14:59 ` John Keeping
2013-05-13 15:45 ` Junio C Hamano
2013-05-13 15:52 ` John Keeping
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130512085934.GG2299@serenity.lan \
--to=john@keeping.me.uk \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).