git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Edmundo Carmona Antoranz <eantoranz@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC] introducing git replay
Date: Wed, 13 Apr 2022 10:48:48 -0700	[thread overview]
Message-ID: <xmqqlew898n3.fsf@gitster.g> (raw)
In-Reply-To: <20220413164336.101390-1-eantoranz@gmail.com> (Edmundo Carmona Antoranz's message of "Wed, 13 Apr 2022 18:43:35 +0200")

Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

I've already gave my take on "is this interesting?" question with a
"not really", but let's look at the code, as the expertise will
translate to your future contributions easily, even if this
particular code may turn out not to be used in the project.

> +static unsigned int replay_indexof(struct commit *commit,
> +				   struct commit_list *list)
> +{
> +	int res;
> +	if (list == NULL)
> +		return -1;

We encourage to have a blank line between the end of the decls and
the beginning of the statements.  Add one after the line that
declares the variable "res".  

In an existing code that consistently uses the abbreviation, it is a
different story, but "result" is not too long to spell out, and
because you do not use the variable that often anyway, you would
avoid unnecessary friction on the readers not to abbreviate it to
"res" here.

> +	if (!oidcmp(&list->item->object.oid,
> +		    &commit->object.oid))

	if (!oidcmp(&list->item->object.oid, &commit->object.oid))

is 66 columns wide and this line is wrapped too short, without an
apparent upside to make the result any easier to read.

> +		return 0;
> +	res = replay_indexof(commit, list->next);

Do we need to go recursive here?  It feels wasteful, compared to
iteratively doing this.  FWIW, is the singly chained commit_list the
best data structure if you have "a collection of commits, among
which you'd need to find an existing one, if any"?  You may want to
consider using a hashset, perhaps, as the only thing you seem to be
getting out of the data structure is "is this among the old_commits
set?"  Another possibility, if you do not call APIs that use the
object flags, may be to allocate a flag bit and mark these commits
in the original history you discover via get_revision(), instead of
placing them in a singly chained commit_list structure.  Then the
loop in replay_commit() can just see if parent->object.flags has
that "in the original history?" bit set to decide.

> +	return res < 0 ? res : res + 1;
> +}
> +
> +static struct commit *replay_find_commit(const char *name)
> +{
> +	struct commit *commit = lookup_commit_reference_by_name(name);
> +	if (!commit)
> +		die(_("no such branch/commit '%s'"), name);
> +	return commit;
> +}
> +
> +static struct commit* replay_commit(struct commit * orig_commit)

In our codebase, an asterisk sticks to the identifier, not the type.

> +{
> +	struct pretty_print_context ctx = {0};
> +	struct strbuf body = STRBUF_INIT;
> +	struct strbuf author = STRBUF_INIT;
> +	struct strbuf committer = STRBUF_INIT;
> +	struct object_id new_commit_oid;
> +	struct commit *new_commit;
> +
> +	struct commit_list *new_parents_head = NULL;
> +	struct commit_list **new_parents = &new_parents_head;
> +	struct commit_list *parents = orig_commit->parents;
> +	while (parents) {

It is not _wrong_ to have a blank line inside the decl block if
there is a logical separation between the groups.  I am not sure if
the one in the above after "struct commit *new_commit" qualifies as
one.

Regardless, after such a large decl block, we want a blank line
before the first statement.

> +		struct commit *parent = parents->item;
> +		int commit_index;
> +		struct commit *new_parent;
> +
> +		commit_index = replay_indexof(parent, old_commits);
> +
> +		if (commit_index < 0)
> +			 // won't be replayed, use the original parent

We still frown upon // comments in this codebase.

> +			new_parent = parent;
> +		else {
> +			// it might have been translated already
> +			if (!new_commits[commit_index])
> +				new_commits[commit_index] = replay_commit(parent);
> +			new_parent = new_commits[commit_index];
> +		}
> +		new_parents = commit_list_append(new_parent, new_parents);
> +		parents = parents->next;
> +	}
> +
> +	format_commit_message(orig_commit, "%B", &body, &ctx);
> +	// TODO timezones
> +	format_commit_message(orig_commit, "%an <%ae> %at +0000", &author, &ctx);
> +	// TODO consider committer (control with an option)
> +	format_commit_message(orig_commit, "%cn <%ce> %ct +0000", &committer, &ctx);

Yuck.  Shouldn't this code, whose purpose is to replay the messages
and metadata of the commit as faithfully as possible while
reparenting them, be just reusing the original commit object?

Perhaps turning save_commit_buffer on, reading the original commit
object in the raw format, rewriting the parent pointers (and nothing
else) and finally calling write_object_file() to create the new
commit object is what this code should be doing instead.

And that would be another reason why this is probably better done as
fast-export piped to fast-import, but this message is not about the
design of the "feature" itself, so let me stop talking about that.

  parent reply	other threads:[~2022-04-13 17:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 16:43 [RFC] introducing git replay Edmundo Carmona Antoranz
2022-04-13 17:05 ` Junio C Hamano
2022-04-15 18:46   ` Edmundo Carmona Antoranz
2022-04-15 20:33     ` Junio C Hamano
2022-04-16  5:35       ` Edmundo Carmona Antoranz
2022-04-16  6:39         ` Junio C Hamano
2022-04-16  7:02           ` Edmundo Carmona Antoranz
2022-04-17  5:05         ` Elijah Newren
2022-04-17  5:37           ` Edmundo Carmona Antoranz
2022-04-17 17:22             ` Martin von Zweigbergk
2022-04-18  7:04               ` Edmundo Carmona Antoranz
2022-04-18  7:29           ` Sergey Organov
2022-04-18 16:27             ` Elijah Newren
2022-04-18 17:33               ` Sergey Organov
2022-04-20 11:27               ` Tao Klerks
2022-04-21  2:33                 ` Elijah Newren
2022-04-13 17:26 ` rsbecker
2022-04-13 17:30   ` Edmundo Carmona Antoranz
2022-04-13 17:44     ` Edmundo Carmona Antoranz
2022-04-13 17:44 ` Phillip Susi
2022-04-13 17:49   ` Edmundo Carmona Antoranz
2022-04-13 19:07     ` Ævar Arnfjörð Bjarmason
2022-04-13 17:48 ` Junio C Hamano [this message]
2022-04-13 17:56   ` Edmundo Carmona Antoranz
2022-04-13 20:06   ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqlew898n3.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=eantoranz@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).