Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Keeping <john@keeping.me.uk>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH] patch-ids.c: cache patch IDs in a notes tree
Date: Sun, 12 May 2013 09:59:34 +0100	[thread overview]
Message-ID: <20130512085934.GG2299@serenity.lan> (raw)
In-Reply-To: <7v1u9cx5pf.fsf@alter.siamese.dyndns.org>

On Sat, May 11, 2013 at 08:00:44PM -0700, Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Sat, May 11, 2013 at 2:49 PM, John Keeping <john@keeping.me.uk> wrote:
> >>
> >> Hmm... I hadn't realised that.  Looking a bit closer, it looks like
> >> init_patch_ids sets up its own diffopts so its not affected by the
> >> command line (except for pathspecs which would be easy to check for).
> >> Of course that still means it can be affected by settings in the user's
> >> configuration.
> >
> > .. and in the actual diff algorithm.
> 
> As to the "objection" side of the argument, I already said
> essentially the same thing several months ago:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/202654/focus=202898
> 
> and do not have much to add [*1*].
> 
> However.
> 
> The use of patch-id in cherry and rebase is to facilitate avoiding
> to replay commits that are obviously identical to the ones you have
> in your history.  The cached patch id for an existing old commit may
> differ from a patch id you freshly compute for a new commit you are
> trying to see if it truly new, even though they may represent the
> same change.  So we may incorrectly think such a new commit is not
> yet in your history and attempt to replay it.
> 
> But it is not a big problem.  Either 3-way merge notices that there
> is nothing new, or you get a conflict and have chance to inspect
> what is going on.

It's not a problem here, but false negatives would be annoying if you're
looking at "git log --cherry-mark".

> A conceptually much larger and more problematic issue is that we may
> discard a truly new change that you still need as an old one you
> already have due to a hash collision and discard it.  Because the
> hash space of SHA-1 is so large, however, it is not a problem in
> practice, and more importantly, that hash space is just as large as
> the hash space used by Git to reduce a patch to a patch id, the
> filtering done with patch-id in cherry and rebase _already_ have
> that exact problem with or without this additional cache layer. A
> stale cache may make the possibility of lost change due to such a
> hash collision merely twice as likely.
> 
> > ... it's a "the patch ID actually ignores a lot of data in order
> > to give the same ID even if lins have been added above it, and the
> > patch is at different line numbers etc".
> 
> Yes.
> 
> > So maybe it doesn't matter. But at the same time, I really think
> > caching patch ID's should be something people should be aware of is
> > fundamentally wrong, even if it might work.
> 
> I do not think it is "caching patch ID" that people should be aware
> of is fundamentally wrong.  What is fundamentally wrong, even if it
> might work, is "using patch ID" itself.
> 
> > And quite frankly, if you do rebases etc so much that you think patch
> > ID's are so important that they need to be cached, you may be doing
> > odd/wrong things.
> 
> And that, too ;-)

I've never noticed a problem with rebases, it's when I use "git log
--cherry master..." to see if patches I've sent to a mailing list have
been picked up.

To take Git as an example (albeit a bad one because "What's Cooking" is
a more useful way to track patch state here), if I compare this patch to
pu I have:

	$ git rev-list --left-right --count pu...
	234	1

and caching patch IDs takes that from ~0.6s to ~0.1s.  When doing that
over several branches consecutively that makes a big difference to the
overall runtime, especially because most of the commits of interest will
be cached during the first one.

next prev parent reply	other threads:[~2013-05-12  8:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-11 19:54 [PATCH] patch-ids.c: cache patch IDs in a notes tree John Keeping
2013-05-11 21:10 ` Linus Torvalds
2013-05-11 21:49   ` John Keeping
2013-05-11 22:41     ` Linus Torvalds
2013-05-11 23:57       ` Johannes Schindelin
2013-05-12  9:08         ` John Keeping
2013-05-12 11:41           ` [RFC/PATCH v2] patch-ids: " John Keeping
2013-05-12 11:57             ` John Keeping
2013-05-12  3:00       ` [PATCH] patch-ids.c: " Junio C Hamano
2013-05-12  8:59         ` John Keeping [this message]
2013-05-12 22:19           ` Junio C Hamano
2013-05-13  7:59             ` John Keeping
2013-05-13 13:53               ` Junio C Hamano
2013-05-13 14:02                 ` John Keeping
2013-05-13 14:46                   ` Junio C Hamano
2013-05-13 14:59                     ` John Keeping
2013-05-13 15:45                       ` Junio C Hamano
2013-05-13 15:52                         ` John Keeping

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130512085934.GG2299@serenity.lan \
    --to=john@keeping.me.uk \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.