From: Arnaud Morin <arnaud.morin@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Kevin Willford <kewillf@microsoft.com>
Subject: Re: rev-list with multiple commits sharing same patch-id
Date: Tue, 12 Jan 2021 15:34:38 +0000 [thread overview]
Message-ID: <20210112153438.GC32482@sync> (raw)
In-Reply-To: <X/28IXBpse2dNZQH@coredump.intra.peff.net>
So instead of:
id = has_commit_patch_id(commit, &ids);
We should use some sort of iterator, in order to call:
id->commit->object.flags |= cherry_flag;
for all commits which are related to this patch?
--
Arnaud Morin
On 12.01.21 - 10:11, Jeff King wrote:
> On Tue, Jan 12, 2021 at 09:17:37AM -0500, Jeff King wrote:
>
> > It looks like this was broken in v2.10.0, via dfb7a1b4d0 (patch-ids:
> > stop using a hand-rolled hashmap implementation, 2016-07-29).
> >
> > I think the issue is that it is allowing duplicate entries in the
> > hashmap. The algorithm is something like:
> >
> > - iterate over left-hand commits, inserting patch-id for each into
> > hashmap
> >
> > - iterate over right-hand commits, seeing if any are present in
> > hashmap. If so, we exclude the commit _and_ mark the patch-id as
> > "seen"
> >
> > - iterate again over left-hand commits, omitting any whose patch-ids
> > are marked as "seen"
> >
> > So if two commits on the left-hand side have the same patch-id, if we
> > insert two entries into the hashmap, then only one of them is going to
> > get its "seen" flag marked in the middle step.
>
> Yeah, that's definitely it. Here's what the fix would look like directly
> on top of dfb7a1b4d0:
>
> diff --git a/patch-ids.c b/patch-ids.c
> index db31fa647a..a8ed4f0872 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -66,12 +66,24 @@ struct patch_id *add_commit_patch_id(struct commit *commit,
> struct patch_ids *ids)
> {
> struct patch_id *key = xcalloc(1, sizeof(*key));
> + struct patch_id *old;
>
> if (init_patch_id_entry(key, commit, ids)) {
> free(key);
> return NULL;
> }
>
> + /*
> + * Don't allow duplicates. Note that we can't use hashmap_put here; we
> + * must retain the _old_ struct, because some commit->util is pointing
> + * to it.
> + */
> + old = hashmap_get(&ids->patches, key, ids);
> + if (old) {
> + free(key);
> + return old;
> + }
> +
> hashmap_add(&ids->patches, key);
> return key;
> }
> diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh
> index 28d4f6b259..0c5f1dcc54 100755
> --- a/t/t6007-rev-list-cherry-pick-file.sh
> +++ b/t/t6007-rev-list-cherry-pick-file.sh
> @@ -207,4 +207,16 @@ test_expect_success '--count --left-right' '
> test_cmp expect actual
> '
>
> +test_expect_success '--cherry-pick with duplicates on each side' '
> + git checkout -b dup-orig &&
> + test_commit dup-base &&
> + git revert dup-base &&
> + git cherry-pick dup-base &&
> + git checkout -b dup-side HEAD~3 &&
> + test_tick &&
> + git cherry-pick -3 dup-orig &&
> + git rev-list --cherry-pick dup-orig...dup-side >actual &&
> + test_must_be_empty actual
> +'
> +
> test_done
>
>
> Unfortunately we can't do the same thing now. The "seen" flag went away
> in 683f17ec44 (patch-ids: replace the seen indicator with a commit
> pointer, 2016-07-29), in favor of having each patch_id struct point to a
> "struct commit". We could revert that, or turn that pointer into a list,
> but it gets worse. Later, b3dfeebb92 (rebase: avoid computing
> unnecessary patch IDs, 2016-07-29) built on that by lazily computing the
> patch-ids. So each patch_id struct must point to exactly one commit!
>
> We'll have to either:
>
> - split them apart into two structs at the time of the lazy computation
>
> or
>
> - accept having duplicate patch_ids, but on lookup make sure we find
> _all_ matches, not just the first one.
>
> Hmm, that last one might not be too bad, since we have
> hashmap_get_next() (it does mean iterating over a chained bucket
> linearly, but that's already the worst-case for a hash lookup).
>
> -Peff
next prev parent reply other threads:[~2021-01-12 15:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-09 16:24 rev-list with multiple commits sharing same patch-id Arnaud Morin
2021-01-11 7:59 ` Arnaud Morin
2021-01-11 9:54 ` Christian Couder
2021-01-11 18:25 ` Arnaud Morin
2021-01-12 14:17 ` Jeff King
2021-01-12 15:11 ` Jeff King
2021-01-12 15:34 ` Arnaud Morin [this message]
2021-01-12 15:52 ` [PATCH] patch-ids: handle duplicate hashmap entries Jeff King
2021-01-12 16:24 ` Arnaud Morin
2021-01-12 19:13 ` Junio C Hamano
2021-01-13 9:24 ` Arnaud Morin
2021-01-13 12:59 ` Jeff King
2021-01-13 20:21 ` Junio C Hamano
2021-01-13 20:33 ` Jeff King
2021-01-13 19:28 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210112153438.GC32482@sync \
--to=arnaud.morin@gmail.com \
--cc=git@vger.kernel.org \
--cc=kewillf@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.