Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Keeping <john@keeping.me.uk>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree
Date: Sun, 12 May 2013 09:59:34 +0100	[thread overview]
Message-ID: <20130512085934.GG2299@serenity.lan> (raw)
In-Reply-To: <7v1u9cx5pf.fsf@alter.siamese.dyndns.org>

On Sat, May 11, 2013 at 08:00:44PM -0700, Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Sat, May 11, 2013 at 2:49 PM, John Keeping <john@keeping.me.uk> wrote:
> >>
> >> Hmm... I hadn't realised that.  Looking a bit closer, it looks like
> >> init_patch_ids sets up its own diffopts so its not affected by the
> >> command line (except for pathspecs which would be easy to check for).
> >> Of course that still means it can be affected by settings in the user's
> >> configuration.
> >
> > .. and in the actual diff algorithm.
> 
> As to the "objection" side of the argument, I already said
> essentially the same thing several months ago:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/202654/focus=202898
> 
> and do not have much to add [*1*].
> 
> However.
> 
> The use of patch-id in cherry and rebase is to facilitate avoiding
> to replay commits that are obviously identical to the ones you have
> in your history.  The cached patch id for an existing old commit may
> differ from a patch id you freshly compute for a new commit you are
> trying to see if it truly new, even though they may represent the
> same change.  So we may incorrectly think such a new commit is not
> yet in your history and attempt to replay it.
> 
> But it is not a big problem.  Either 3-way merge notices that there
> is nothing new, or you get a conflict and have chance to inspect
> what is going on.

It's not a problem here, but false negatives would be annoying if you're
looking at "git log --cherry-mark".

> A conceptually much larger and more problematic issue is that we may
> discard a truly new change that you still need as an old one you
> already have due to a hash collision and discard it.  Because the
> hash space of SHA-1 is so large, however, it is not a problem in
> practice, and more importantly, that hash space is just as large as
> the hash space used by Git to reduce a patch to a patch id, the
> filtering done with patch-id in cherry and rebase _already_ have
> that exact problem with or without this additional cache layer. A
> stale cache may make the possibility of lost change due to such a
> hash collision merely twice as likely.
> 
> > ... it's a "the patch ID actually ignores a lot of data in order
> > to give the same ID even if lins have been added above it, and the
> > patch is at different line numbers etc".
> 
> Yes.
> 
> > So maybe it doesn't matter. But at the same time, I really think
> > caching patch ID's should be something people should be aware of is
> > fundamentally wrong, even if it might work.
> 
> I do not think it is "caching patch ID" that people should be aware
> of is fundamentally wrong.  What is fundamentally wrong, even if it
> might work, is "using patch ID" itself.
> 
> > And quite frankly, if you do rebases etc so much that you think patch
> > ID's are so important that they need to be cached, you may be doing
> > odd/wrong things.
> 
> And that, too ;-)

I've never noticed a problem with rebases, it's when I use "git log
--cherry master..." to see if patches I've sent to a mailing list have
been picked up.

To take Git as an example (albeit a bad one because "What's Cooking" is
a more useful way to track patch state here), if I compare this patch to
pu I have:

	$ git rev-list --left-right --count pu...
	234	1

and caching patch IDs takes that from ~0.6s to ~0.1s.  When doing that
over several branches consecutively that makes a big difference to the
overall runtime, especially because most of the commits of interest will
be cached during the first one.

next prev parent reply	other threads:[~2013-05-12  8:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-11 19:54 [PATCH] patch-ids.c: cache patch IDs in a notes tree John Keeping
2013-05-11 21:10 ` Linus Torvalds
2013-05-11 21:49   ` John Keeping
2013-05-11 22:41     ` Linus Torvalds
2013-05-11 23:57       ` Johannes Schindelin
2013-05-12  9:08         ` John Keeping
2013-05-12 11:41           ` [RFC/PATCH v2] patch-ids: " John Keeping
2013-05-12 11:57             ` John Keeping
2013-05-12  3:00       ` [PATCH] patch-ids.c: " Junio C Hamano
2013-05-12  8:59         ` John Keeping [this message]
2013-05-12 22:19           ` Junio C Hamano
2013-05-13  7:59             ` John Keeping
2013-05-13 13:53               ` Junio C Hamano
2013-05-13 14:02                 ` John Keeping
2013-05-13 14:46                   ` Junio C Hamano
2013-05-13 14:59                     ` John Keeping
2013-05-13 15:45                       ` Junio C Hamano
2013-05-13 15:52                         ` John Keeping

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130512085934.GG2299@serenity.lan \
    --to=john@keeping.me.uk \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).