git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] rev-list: new --cherry-pick=loose option
@ 2011-02-21 19:49 Jay Soffian
  2011-02-22  0:16 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Jay Soffian @ 2011-02-21 19:49 UTC (permalink / raw)
  To: git; +Cc: Jay Soffian

rev-list --cherry-pick may be used in combination with the
symmetric difference operator (A...B) to squelch showing commits
which are identical in content.

Sometimes this is too strict of a test, for example if the
cherry-picked commits required conflict resolution, altering
their patch-ids. However, it is still useful to be able to squelch
such commits as they need not be cherry-picked again.

Using --cherry-pick=loose tells git to ignore the patch content
and instead examine its metadata. Specifically, commits with
identical authorship (name and timestamp) and subject are not
shown in the output.

Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
---

This is a bit of a hack just to stimulate conversation. My use
case for this is that I'm cherry-picking commits from a release branch
onto a development branch. (I cannot use a different branching strategy
that would allow me to merge instead.) Sometimes those cherry-picks require
conflict resolution. The current --cherry-pick behavior requires that I
maintain (externally to git) a list of commits which have already been
cherry-picked.

So I started experimenting with a wrapper around rev-list A...B that
looks at commit metadata to determine what's been cherry-picked and
what not, and it works quite well for me.

The functionality in this patch would let me get rid of that wrapper.

Thoughts?

 patch-ids.c |   23 +++++++++++++++++++++--
 patch-ids.h |    2 +-
 revision.c  |    9 +++++++++
 revision.h  |    1 +
 4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/patch-ids.c b/patch-ids.c
index 5717257..6a1f3e3 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -16,6 +16,20 @@ static int commit_patch_id(struct commit *commit, struct diff_options *options,
 	return diff_flush_patch_id(options, sha1);
 }
 
+static int commit_patch_id_loose(struct commit *commit, unsigned char *sha1)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	struct pretty_print_context pctx = {0};
+
+	git_SHA1_Init(&ctx);
+	format_commit_message(commit, "%an %ae %at %s", &buf, &pctx);
+	git_SHA1_Update(&ctx, buf.buf, buf.len);
+	strbuf_release(&buf);
+	git_SHA1_Final(sha1, &ctx);
+	return 0;
+}
+
 static const unsigned char *patch_id_access(size_t index, void *table)
 {
 	struct patch_id **id_table = table;
@@ -65,8 +79,13 @@ static struct patch_id *add_commit(struct commit *commit,
 	unsigned char sha1[20];
 	int pos;
 
-	if (commit_patch_id(commit, &ids->diffopts, sha1))
-		return NULL;
+	if (ids->loose) {
+		if (commit_patch_id_loose(commit, sha1))
+			return NULL;
+	} else {
+		if (commit_patch_id(commit, &ids->diffopts, sha1))
+			return NULL;
+	}
 	pos = patch_pos(ids->table, ids->nr, sha1);
 	if (0 <= pos)
 		return ids->table[pos];
diff --git a/patch-ids.h b/patch-ids.h
index c8c7ca1..f54b0ee 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -8,7 +8,7 @@ struct patch_id {
 
 struct patch_ids {
 	struct diff_options diffopts;
-	int nr, alloc;
+	int nr, alloc, loose;
 	struct patch_id **table;
 	struct patch_id_bucket *patches;
 };
diff --git a/revision.c b/revision.c
index 7b9eaef..6dbbf5d 100644
--- a/revision.c
+++ b/revision.c
@@ -558,6 +558,8 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
 		ids.diffopts.paths = revs->diffopt.paths;
 		ids.diffopts.pathlens = revs->diffopt.pathlens;
 	}
+	if (revs->cherry_pick_loose)
+		ids.loose = 1;
 
 	/* Compute patch-ids for one side */
 	for (p = list; p; p = p->next) {
@@ -1268,6 +1270,13 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--cherry-pick")) {
 		revs->cherry_pick = 1;
 		revs->limited = 1;
+	} else if (!prefixcmp(arg, "--cherry-pick=")) {
+		revs->cherry_pick = 1;
+		revs->limited = 1;
+		if (!strcmp(arg+14, "loose"))
+			revs->cherry_pick_loose = 1;
+		else if (strcmp(arg+14, "strict"))
+			return error("bad --cherry-pick argument");
 	} else if (!strcmp(arg, "--objects")) {
 		revs->tag_objects = 1;
 		revs->tree_objects = 1;
diff --git a/revision.h b/revision.h
index 05659c6..3f94b27 100644
--- a/revision.h
+++ b/revision.h
@@ -66,6 +66,7 @@ struct rev_info {
 			reverse:1,
 			reverse_output_stage:1,
 			cherry_pick:1,
+			cherry_pick_loose:1,
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1;
-- 
1.7.4.1.51.g2cc0c.dirty

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC/PATCH] rev-list: new --cherry-pick=loose option
  2011-02-21 19:49 [RFC/PATCH] rev-list: new --cherry-pick=loose option Jay Soffian
@ 2011-02-22  0:16 ` Junio C Hamano
  2011-02-22  2:54   ` Jay Soffian
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2011-02-22  0:16 UTC (permalink / raw)
  To: Jay Soffian; +Cc: git

Jay Soffian <jaysoffian@gmail.com> writes:

> Thoughts?

I am personally not very interested in this particular "author, timestamp,
title and nothing else" implementation, as that is probably too loose (in
many projects, title by itself is not descriptive enough) to be safe.
Also people would probably want other loose modes with varying degree
(e.g. throwing in the list of touched paths to your mix might make it a
bit safer), so "loose" feels a bit too broad a word to give to this
particular implementation (iow, it does not say in what way it is loose).

> @@ -65,8 +79,13 @@ static struct patch_id *add_commit(struct commit *commit,
>  	unsigned char sha1[20];
>  	int pos;
>  
> -	if (commit_patch_id(commit, &ids->diffopts, sha1))
> -		return NULL;
> +	if (ids->loose) {
> +		if (commit_patch_id_loose(commit, sha1))
> +			return NULL;
> +	} else {
> +		if (commit_patch_id(commit, &ids->diffopts, sha1))
> +			return NULL;
> +	}

If the purpose of the patch is to stir the discussion, it is fine to have
any crappy "here is a strawman" algorithm as an example of an alternative
patch ID computation, but one thing it _should_ do well is to show where
the necessary change should be hooked into, and I think the above "if"
statement is placed in a wrong function.  If you change commit_patch_id()
to take a pointer to the whole ids instead of just &ids->diffopts, it can
decide how the "commit patch ID" is computed without affecting the
callers, no?

And then we could instead introduce patch-id-algo=<foo>, and instead of
"loose" call this particular algorithm "authorship-subject" or something.
Coming up with a pluggable interface to let the end user compute patch
equivalence might be a plus.

Certain patch equivalence might not be easy to define by "do they hash to
the same small value" but by "here are two patches---compare them and tell
me if they are equivalent".  If we can update the code to support that
kind of patch equivalence it would be great, but it is not within the
reach/scope of this patch (not a complaint, but something you may want to
tackle next).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/PATCH] rev-list: new --cherry-pick=loose option
  2011-02-22  0:16 ` Junio C Hamano
@ 2011-02-22  2:54   ` Jay Soffian
  0 siblings, 0 replies; 3+ messages in thread
From: Jay Soffian @ 2011-02-22  2:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Feb 21, 2011 at 7:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jay Soffian <jaysoffian@gmail.com> writes:
>
>> Thoughts?
>
> I am personally not very interested in this particular "author, timestamp,
> title and nothing else" implementation, as that is probably too loose (in
> many projects, title by itself is not descriptive enough) to be safe.

Well, right, it's not title by itself, it's title + authorship. I am
scratching my head at a project that would have commits identical in
author, timestamp, and title.

> Also people would probably want other loose modes with varying degree

Initially I thought this too and was going to let it take pretty-print
specifiers (--cherry-pick=format:...), but I can't see which other
metadata would be useful. Maybe the commit body, but if there were
conflicts, that's something that would have very likely been changed
to add a Conflicts: section.

> (e.g. throwing in the list of touched paths to your mix might make it a
> bit safer), so "loose" feels a bit too broad a word to give to this
> particular implementation (iow, it does not say in what way it is loose).

Throwing in touched paths does make it safer, but for my use case is
too strict for some commits (I sometimes have to account for file
renames when I cherry-pick).

>> @@ -65,8 +79,13 @@ static struct patch_id *add_commit(struct commit *commit,
>>       unsigned char sha1[20];
>>       int pos;
>>
>> -     if (commit_patch_id(commit, &ids->diffopts, sha1))
>> -             return NULL;
>> +     if (ids->loose) {
>> +             if (commit_patch_id_loose(commit, sha1))
>> +                     return NULL;
>> +     } else {
>> +             if (commit_patch_id(commit, &ids->diffopts, sha1))
>> +                     return NULL;
>> +     }
>
> If the purpose of the patch is to stir the discussion, it is fine to have
> any crappy "here is a strawman" algorithm as an example of an alternative
> patch ID computation, but one thing it _should_ do well is to show where
> the necessary change should be hooked into, and I think the above "if"
> statement is placed in a wrong function.  If you change commit_patch_id()
> to take a pointer to the whole ids instead of just &ids->diffopts, it can
> decide how the "commit patch ID" is computed without affecting the
> callers, no?

Good idea.

> And then we could instead introduce patch-id-algo=<foo>, and instead of
> "loose" call this particular algorithm "authorship-subject" or something.
> Coming up with a pluggable interface to let the end user compute patch
> equivalence might be a plus.

I think it's premature to make it pluggable. I was hoping that with
this mode (loose) and the existing mode (strict), the extremes would
be covered. And then we'd wait and see what latent demand might exist
for other modes. :-)

I just don't feel like I know enough about what "the end user" wants
to come up with a reasonably general scheme.

(The best I can come up with that's pluggable and not horribly
expensive is something like open bi-directional pipes to an external
process that gets fed diffs and returns patch-ids.)

> Certain patch equivalence might not be easy to define by "do they hash to
> the same small value" but by "here are two patches---compare them and tell
> me if they are equivalent".  If we can update the code to support that
> kind of patch equivalence it would be great, but it is not within the
> reach/scope of this patch (not a complaint, but something you may want to
> tackle next).

More than I can bite off at the moment.

j.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-02-22  2:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-21 19:49 [RFC/PATCH] rev-list: new --cherry-pick=loose option Jay Soffian
2011-02-22  0:16 ` Junio C Hamano
2011-02-22  2:54   ` Jay Soffian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).