git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] refs: add reflog support to `git refs migrate`
@ 2024-12-09 11:07 Karthik Nayak
  2024-12-09 11:07 ` [PATCH 1/7] refs: include committer info in `ref_update` struct Karthik Nayak
                   ` (8 more replies)
  0 siblings, 9 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the feature was that it didn't support migrating
repositories which contained reflogs. This isn't a requirement on the
server side as repositories are stored as bare repositories (which do
not contain any reflogs). Clients however generally use reflogs and
until now couldn't use the `git refs migrate` command to migrate their
repositories to the new reftable format.

One of the issues for adding reflog support is that the ref transactions
don't support reflogs additions:
  1. While there is REF_LOG_ONLY flag, there is no function to utilize
  the flag and add reflogs.
  2. reference backends generally sort the updates by the refname. This
  wouldn't work for reflogs which need to ensure that they maintain the
  order of creation.
  3. In the files backend, reflog entries are added by obtaining locks
  on the refs themselves. This means each update in the transaction, will
  obtain a ref_lock. This paradigm fails to accompany the fact that there
  could be multiple reflog updates for a refname in a single transaction.
  4. The backends check for duplicate entries, which doesn't make sense
  in the context of adding multiple reflogs for a given refname.

We overcome these issue we make the following changes:
  - Update the ref_update structure to also include the committer
  information. Using this, we can add a new function which only adds
  reflog updates to the transaction.
  - Add an index field to the ref_update structure, this will help order
  updates in pre-defined order, this fixes #2.
  - While the ideal fix for #3 would be to actually introduce reflog
  locks, this wouldn't be possible without breaking backward
  compatibility. So we add a count field to the existing ref_lock. With
  this, multiple reflog updates can share a single ref_lock.

Overall, this series is a bit more involved, and I would appreciate it
if it receives a bit more scrutiny.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Karthik Nayak (7):
      refs: include committer info in `ref_update` struct
      refs: add `index` field to `struct ref_udpate`
      refs/files: add count field to ref_lock
      refs: extract out refname verification in transactions
      refs: introduce the `ref_transaction_update_reflog` function
      refs: allow multiple reflog entries for the same refname
      refs: add support for migrating reflogs

 Documentation/git-refs.txt |   2 -
 refs.c                     | 204 ++++++++++++++++++++++++++++++++-------------
 refs.h                     |  12 +++
 refs/files-backend.c       | 144 ++++++++++++++++++++------------
 refs/refs-internal.h       |  24 ++++--
 refs/reftable-backend.c    |  47 +++++++++--
 t/t1460-refs-migrate.sh    |  73 +++++++++++-----
 7 files changed, 360 insertions(+), 146 deletions(-)
---



--- 

base-commit: e66fd72e972df760a53c3d6da023c17adfc426d6
change-id: 20241111-320-git-refs-migrate-reflogs-a53e3a6cffc9

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 1/7] refs: include committer info in `ref_update` struct
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
@ 2024-12-09 11:07 ` Karthik Nayak
  2024-12-10 16:51   ` Christian Couder
  2024-12-09 11:07 ` [PATCH 2/7] refs: add `index` field to `struct ref_udpate` Karthik Nayak
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

The reference backends obtain the committer information from
`git_committer_info(0)` when adding a reflog. The upcoming patches
introduce support for migrating reflogs between the reference backends.
This requires an interface to creating reflogs, including custom
committer information.

Add a new field `committer_info` to the `ref_update` struct, which is
then used by the reference backends. If there is no `committer_info`
provided, the reference backends default to using
`git_committer_info(0)`. The field itself cannot be set to
`git_committer_info(0)` since the values are dynamic and must be
obtained right when the reflog is being committed.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  1 +
 refs/files-backend.c    | 22 +++++++++++++---------
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 12 +++++++++++-
 4 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 762f3e324d59c60cd4f05c2f257e54de8deb00e5..f003e51c6bf5229bfbce8ce61ffad7cdba0572e0 100644
--- a/refs.c
+++ b/refs.c
@@ -1151,6 +1151,7 @@ void ref_transaction_free(struct ref_transaction *transaction)
 
 	for (i = 0; i < transaction->nr; i++) {
 		free(transaction->updates[i]->msg);
+		free(transaction->updates[i]->committer_info);
 		free((char *)transaction->updates[i]->new_target);
 		free((char *)transaction->updates[i]->old_target);
 		free(transaction->updates[i]);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 64f51f0da905a9a8a1ac4109c6b0a9a85a355db7..13f8539e6caa923cd4834775fcb0cd7f90d82014 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1858,6 +1858,9 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 	struct strbuf sb = STRBUF_INIT;
 	int ret = 0;
 
+	if (!committer)
+		committer = git_committer_info(0);
+
 	strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
 	if (msg && *msg) {
 		strbuf_addch(&sb, '\t');
@@ -1871,8 +1874,10 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 }
 
 static int files_log_ref_write(struct files_ref_store *refs,
-			       const char *refname, const struct object_id *old_oid,
-			       const struct object_id *new_oid, const char *msg,
+			       const char *refname,
+			       const struct object_id *old_oid,
+			       const struct object_id *new_oid,
+			       const char *committer_info, const char *msg,
 			       int flags, struct strbuf *err)
 {
 	int logfd, result;
@@ -1889,8 +1894,7 @@ static int files_log_ref_write(struct files_ref_store *refs,
 
 	if (logfd < 0)
 		return 0;
-	result = log_ref_write_fd(logfd, old_oid, new_oid,
-				  git_committer_info(0), msg);
+	result = log_ref_write_fd(logfd, old_oid, new_oid, committer_info, msg);
 	if (result) {
 		struct strbuf sb = STRBUF_INIT;
 		int save_errno = errno;
@@ -1974,8 +1978,7 @@ static int commit_ref_update(struct files_ref_store *refs,
 	files_assert_main_repository(refs, "commit_ref_update");
 
 	clear_loose_ref_cache(refs);
-	if (files_log_ref_write(refs, lock->ref_name,
-				&lock->old_oid, oid,
+	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid, oid, NULL,
 				logmsg, flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 		strbuf_addf(err, "cannot update the ref '%s': %s",
@@ -2007,8 +2010,8 @@ static int commit_ref_update(struct files_ref_store *refs,
 		if (head_ref && (head_flag & REF_ISSYMREF) &&
 		    !strcmp(head_ref, lock->ref_name)) {
 			struct strbuf log_err = STRBUF_INIT;
-			if (files_log_ref_write(refs, "HEAD",
-						&lock->old_oid, oid,
+			if (files_log_ref_write(refs, "HEAD", &lock->old_oid,
+						oid, git_committer_info(0),
 						logmsg, flags, &log_err)) {
 				error("%s", log_err.buf);
 				strbuf_release(&log_err);
@@ -2969,7 +2972,8 @@ static int parse_and_write_reflog(struct files_ref_store *refs,
 	}
 
 	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid,
-				&update->new_oid, update->msg, update->flags, err)) {
+				&update->new_oid, update->committer_info,
+				update->msg, update->flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 
 		strbuf_addf(err, "cannot update the ref '%s': %s",
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 58aa56d1b27c85d606ed7c8c0d908e4b87d1066b..0fd95cdacd99e4a728c22f5286f6b3f0f360c110 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -113,6 +113,7 @@ struct ref_update {
 	void *backend_data;
 	unsigned int type;
 	char *msg;
+	char *committer_info;
 
 	/*
 	 * If this ref_update was split off of a symref update via
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 647ef9b05b1dc9a376ed054330b487f7595c5caa..e882602487c66261d586a94101bb1b4e9a2ed60e 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1379,11 +1379,21 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			}
 
 			if (create_reflog) {
+				struct ident_split c;
+
 				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
 				log = &logs[logs_nr++];
 				memset(log, 0, sizeof(*log));
 
-				fill_reftable_log_record(log, &committer_ident);
+				if (u->committer_info) {
+					if (split_ident_line(&c, u->committer_info,
+							     strlen(u->committer_info)))
+						BUG("failed splitting committer info");
+				} else {
+					c = committer_ident;
+				}
+
+				fill_reftable_log_record(log, &c);
 				log->update_index = ts;
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 2/7] refs: add `index` field to `struct ref_udpate`
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
  2024-12-09 11:07 ` [PATCH 1/7] refs: include committer info in `ref_update` struct Karthik Nayak
@ 2024-12-09 11:07 ` Karthik Nayak
  2024-12-09 11:07 ` [PATCH 3/7] refs/files: add count field to ref_lock Karthik Nayak
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

The reftable backend, sorts its updates by refname before applying them,
this ensures that the references are stored sorted. When migrating
reflogs from one backend to another, the order of the reflogs must be
maintained. Add a new `index` field to the `ref_update` struct to
facilitate this.

This field is used in the reftable backend's sort comparison function
`transaction_update_cmp`, to ensure that indexed fields maintain their
order.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/refs-internal.h    |  7 +++++++
 refs/reftable-backend.c | 13 +++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 0fd95cdacd99e4a728c22f5286f6b3f0f360c110..f5c733d099f0c6f1076a25f4f77d9d5eb345ec87 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -115,6 +115,13 @@ struct ref_update {
 	char *msg;
 	char *committer_info;
 
+	/*
+	 * The index overrides the default sort algorithm. This is needed
+	 * when migrating reflogs and we want to ensure we carry over the
+	 * same order.
+	 */
+	unsigned int index;
+
 	/*
 	 * If this ref_update was split off of a symref update via
 	 * split_symref_update(), then this member points at that
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index e882602487c66261d586a94101bb1b4e9a2ed60e..c008f20be719fec3af6a8f81c821cb9c263764d7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1279,8 +1279,17 @@ static int reftable_be_transaction_abort(struct ref_store *ref_store UNUSED,
 
 static int transaction_update_cmp(const void *a, const void *b)
 {
-	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
-		      ((struct reftable_transaction_update *)b)->update->refname);
+	struct reftable_transaction_update *update_a = (struct reftable_transaction_update *)a;
+	struct reftable_transaction_update *update_b = (struct reftable_transaction_update *)b;
+
+	/*
+	 * If there is an index set, it should take preference (default is 0).
+	 * This ensures that updates with indexes are sorted amongst themselves.
+	 */
+	if (update_a->update->index || update_b->update->index)
+		return update_a->update->index - update_b->update->index;
+
+	return strcmp(update_a->update->refname, update_b->update->refname);
 }
 
 static int write_transaction_table(struct reftable_writer *writer, void *cb_data)

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 3/7] refs/files: add count field to ref_lock
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
  2024-12-09 11:07 ` [PATCH 1/7] refs: include committer info in `ref_update` struct Karthik Nayak
  2024-12-09 11:07 ` [PATCH 2/7] refs: add `index` field to `struct ref_udpate` Karthik Nayak
@ 2024-12-09 11:07 ` Karthik Nayak
  2024-12-10 17:22   ` Christian Couder
  2024-12-11  9:05   ` Christian Couder
  2024-12-09 11:07 ` [PATCH 4/7] refs: extract out refname verification in transactions Karthik Nayak
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

When refs are updated in the files-backend, a lock is obtained for the
corresponding file path. This is the case even for reflogs, i.e. a lock
is obtained on the reference path instead of the reflog path. This
works, since generally, reflogs are updated alongside the ref.

The upcoming patches will add support for reflog updates in ref
transaction. This means, in a particular transaction we want to have ref
updates and reflog updates. For refs, in a given transaction there can
only be one update. But, we can theoretically have multiple reflog
updates in a given transaction.

The current flow does not support this, because currently refs & reflogs
are treated as a single entity and capture the lock together. To
separate this, add a count field to ref_lock. With this, multiple
updates can hold onto a single ref_lock and the lock will only be
released when all of them release the lock.

This patch only adds the `count` field to `ref_lock` and adds the logic
to increment and decrement the lock. In a follow up commit, we'll
separate the reflog update logic from ref updates and utilize this
functionality.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c | 59 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 13f8539e6caa923cd4834775fcb0cd7f90d82014..9c929c1ac33bc62a75620e684a809d46b574f1c6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -71,6 +71,8 @@ struct ref_lock {
 	char *ref_name;
 	struct lock_file lk;
 	struct object_id old_oid;
+	/* count keeps track of users of the lock */
+	unsigned int count;
 };
 
 struct files_ref_store {
@@ -638,9 +640,12 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 
 static void unlock_ref(struct ref_lock *lock)
 {
-	rollback_lock_file(&lock->lk);
-	free(lock->ref_name);
-	free(lock);
+	lock->count--;
+	if (!lock->count) {
+		rollback_lock_file(&lock->lk);
+		free(lock->ref_name);
+		free(lock);
+	}
 }
 
 /*
@@ -696,6 +701,7 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	*lock_p = CALLOC_ARRAY(lock, 1);
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 	files_ref_path(refs, &ref_file, refname);
 
 retry:
@@ -1169,6 +1175,7 @@ static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
 		goto error_return;
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 
 	if (raceproof_create_file(ref_file.buf, create_reflock, &lock->lk)) {
 		unable_to_lock_message(ref_file.buf, errno, err);
@@ -2535,6 +2542,12 @@ static int check_old_oid(struct ref_update *update, struct object_id *oid,
 	return -1;
 }
 
+struct files_transaction_backend_data {
+	struct ref_transaction *packed_transaction;
+	int packed_refs_locked;
+	struct strmap ref_locks;
+};
+
 /*
  * Prepare for carrying out update:
  * - Lock the reference referred to by update.
@@ -2557,11 +2570,14 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 {
 	struct strbuf referent = STRBUF_INIT;
 	int mustexist = ref_update_expects_existing_old_ref(update);
+	struct files_transaction_backend_data *backend_data;
 	int ret = 0;
 	struct ref_lock *lock;
 
 	files_assert_main_repository(refs, "lock_ref_for_update");
 
+	backend_data = transaction->backend_data;
+
 	if ((update->flags & REF_HAVE_NEW) && ref_update_has_null_new_value(update))
 		update->flags |= REF_DELETING;
 
@@ -2572,18 +2588,25 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			goto out;
 	}
 
-	ret = lock_raw_ref(refs, update->refname, mustexist,
-			   affected_refnames,
-			   &lock, &referent,
-			   &update->type, err);
-	if (ret) {
-		char *reason;
+	lock = strmap_get(&backend_data->ref_locks, update->refname);
+	if (lock) {
+		lock->count = lock->count + 1;
+	} else {
+		ret = lock_raw_ref(refs, update->refname, mustexist,
+				   affected_refnames,
+				   &lock, &referent,
+				   &update->type, err);
+		if (ret) {
+			char *reason;
+
+			reason = strbuf_detach(err, NULL);
+			strbuf_addf(err, "cannot lock ref '%s': %s",
+				    ref_update_original_update_refname(update), reason);
+			free(reason);
+			goto out;
+		}
 
-		reason = strbuf_detach(err, NULL);
-		strbuf_addf(err, "cannot lock ref '%s': %s",
-			    ref_update_original_update_refname(update), reason);
-		free(reason);
-		goto out;
+		strmap_put(&backend_data->ref_locks, update->refname, lock);
 	}
 
 	update->backend_data = lock;
@@ -2730,11 +2753,6 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	return ret;
 }
 
-struct files_transaction_backend_data {
-	struct ref_transaction *packed_transaction;
-	int packed_refs_locked;
-};
-
 /*
  * Unlock any references in `transaction` that are still locked, and
  * mark the transaction closed.
@@ -2767,6 +2785,8 @@ static void files_transaction_cleanup(struct files_ref_store *refs,
 		if (backend_data->packed_refs_locked)
 			packed_refs_unlock(refs->packed_ref_store);
 
+		strmap_clear(&backend_data->ref_locks, 0);
+
 		free(backend_data);
 	}
 
@@ -2796,6 +2816,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		goto cleanup;
 
 	CALLOC_ARRAY(backend_data, 1);
+	strmap_init(&backend_data->ref_locks);
 	transaction->backend_data = backend_data;
 
 	/*

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 4/7] refs: extract out refname verification in transactions
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
                   ` (2 preceding siblings ...)
  2024-12-09 11:07 ` [PATCH 3/7] refs/files: add count field to ref_lock Karthik Nayak
@ 2024-12-09 11:07 ` Karthik Nayak
  2024-12-11  9:26   ` Christian Couder
  2024-12-09 11:07 ` [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
the refname of the update is verified for:

  - Ensuring it is not a pseudoref.
  - Checking the refname format.

These checks are also be needed in a following commit where the function
to add reflog updates to the transaction is introduced. Extract the code
out into a new static function.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/refs.c b/refs.c
index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..732c236a3fd0cf324cc172b48d3d54f6dbadf4a4 100644
--- a/refs.c
+++ b/refs.c
@@ -1196,6 +1196,29 @@ struct ref_update *ref_transaction_add_update(
 	return update;
 }
 
+static int transaction_refname_verification(const char *refname,
+					    const struct object_id *new_oid,
+					    unsigned int flags,
+					    struct strbuf *err)
+{
+	if (flags & REF_SKIP_REFNAME_VERIFICATION)
+		return 0;
+
+	if (is_pseudo_ref(refname)) {
+		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
+			    refname);
+		return -1;
+	} else if ((new_oid && !is_null_oid(new_oid)) ?
+		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
+		 !refname_is_safe(refname)) {
+		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
+			    refname);
+		return -1;
+	}
+
+	return 0;
+}
+
 int ref_transaction_update(struct ref_transaction *transaction,
 			   const char *refname,
 			   const struct object_id *new_oid,
@@ -1205,6 +1228,8 @@ int ref_transaction_update(struct ref_transaction *transaction,
 			   unsigned int flags, const char *msg,
 			   struct strbuf *err)
 {
+	int ret;
+
 	assert(err);
 
 	if ((flags & REF_FORCE_CREATE_REFLOG) &&
@@ -1213,21 +1238,9 @@ int ref_transaction_update(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    ((new_oid && !is_null_oid(new_oid)) ?
-		     check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
-			   !refname_is_safe(refname))) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
-		return -1;
-	}
-
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
-		return -1;
-	}
+	ret = transaction_refname_verification(refname, new_oid, flags, err);
+	if (ret)
+		return ret;
 
 	if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
 		BUG("illegal flags 0x%x passed to ref_transaction_update()", flags);

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
                   ` (3 preceding siblings ...)
  2024-12-09 11:07 ` [PATCH 4/7] refs: extract out refname verification in transactions Karthik Nayak
@ 2024-12-09 11:07 ` Karthik Nayak
  2024-12-11 10:10   ` Christian Couder
  2024-12-11 14:26   ` Patrick Steinhardt
  2024-12-09 11:07 ` [PATCH 6/7] refs: allow multiple reflog entries for the same refname Karthik Nayak
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

Introduce a new function `ref_transaction_update_reflog`, for clients to
add a reflog update to a transaction. While the existing function
`ref_transaction_update` also allows clients to add a reflog entry, this
function does a few things more, It:
  - Enforces that only a reflog entry is added and does not update the
  ref itself.
  - Allows the users to also provide the committer information. This
  means clients can add reflog entries with custom committer
  information.

A follow up commit will utilize this function to add reflog support to
`git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  | 89 +++++++++++++++++++++++++++++++++++++------------
 refs.h                  | 12 +++++++
 refs/files-backend.c    | 48 +++++++++++++++-----------
 refs/refs-internal.h    | 16 +++++----
 refs/reftable-backend.c |  6 ++--
 5 files changed, 122 insertions(+), 49 deletions(-)

diff --git a/refs.c b/refs.c
index 732c236a3fd0cf324cc172b48d3d54f6dbadf4a4..602a65873181a90751def525608a7fa7bea59562 100644
--- a/refs.c
+++ b/refs.c
@@ -1160,13 +1160,15 @@ void ref_transaction_free(struct ref_transaction *transaction)
 	free(transaction);
 }
 
-struct ref_update *ref_transaction_add_update(
-		struct ref_transaction *transaction,
-		const char *refname, unsigned int flags,
-		const struct object_id *new_oid,
-		const struct object_id *old_oid,
-		const char *new_target, const char *old_target,
-		const char *msg)
+struct ref_update *ref_transaction_add_update(struct ref_transaction *transaction,
+					      const char *refname,
+					      unsigned int flags,
+					      const struct object_id *new_oid,
+					      const struct object_id *old_oid,
+					      const char *new_target,
+					      const char *old_target,
+					      const char *committer_info,
+					      const char *msg)
 {
 	struct ref_update *update;
 
@@ -1190,8 +1192,15 @@ struct ref_update *ref_transaction_add_update(
 		oidcpy(&update->new_oid, new_oid);
 	if ((flags & REF_HAVE_OLD) && old_oid)
 		oidcpy(&update->old_oid, old_oid);
-	if (!(flags & REF_SKIP_CREATE_REFLOG))
+	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
+		if (committer_info) {
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, committer_info);
+			update->committer_info = strbuf_detach(&sb, NULL);
+		}
+
 		update->msg = normalize_reflog_message(msg);
+	}
 
 	return update;
 }
@@ -1199,20 +1208,29 @@ struct ref_update *ref_transaction_add_update(
 static int transaction_refname_verification(const char *refname,
 					    const struct object_id *new_oid,
 					    unsigned int flags,
+					    unsigned int reflog,
 					    struct strbuf *err)
 {
 	if (flags & REF_SKIP_REFNAME_VERIFICATION)
 		return 0;
 
 	if (is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
+		if (reflog)
+			strbuf_addf(err, _("refusing to update reflog for pseudoref '%s'"),
+				    refname);
+		else
+			strbuf_addf(err, _("refusing to update pseudoref '%s'"),
+				    refname);
 		return -1;
 	} else if ((new_oid && !is_null_oid(new_oid)) ?
 		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
 		 !refname_is_safe(refname)) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
+		if (reflog)
+			strbuf_addf(err, _("refusing to update reflog with bad name '%s'"),
+				    refname);
+		else
+			strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
+				    refname);
 		return -1;
 	}
 
@@ -1238,7 +1256,7 @@ int ref_transaction_update(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	ret = transaction_refname_verification(refname, new_oid, flags, err);
+	ret = transaction_refname_verification(refname, new_oid, flags, 0, err);
 	if (ret)
 		return ret;
 
@@ -1255,18 +1273,47 @@ int ref_transaction_update(struct ref_transaction *transaction,
 	flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0);
 	flags |= (new_target ? REF_HAVE_NEW : 0) | (old_target ? REF_HAVE_OLD : 0);
 
-	ref_transaction_add_update(transaction, refname, flags,
-				   new_oid, old_oid, new_target,
-				   old_target, msg);
+	ref_transaction_add_update(transaction, refname, flags, new_oid,
+				   old_oid, new_target, old_target, NULL, msg);
+	return 0;
+}
+
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err)
+{
+	struct ref_update *update;
+	int ret;
+
+	assert(err);
+
+	ret = transaction_refname_verification(refname, new_oid, flags, 1, err);
+	if (ret)
+		return ret;
+
+	flags |= REF_LOG_ONLY | REF_NO_DEREF;
+
+	update = ref_transaction_add_update(transaction, refname, flags,
+					    new_oid, old_oid, NULL, NULL,
+					    committer_info, msg);
+	/*
+	 * While we do set the old_oid value, we unset the flag to skip
+	 * old_oid verification which only makes sense for refs.
+	 */
+	update->flags &= ~REF_HAVE_OLD;
+	update->index = index;
+
 	return 0;
 }
 
 int ref_transaction_create(struct ref_transaction *transaction,
-			   const char *refname,
-			   const struct object_id *new_oid,
-			   const char *new_target,
-			   unsigned int flags, const char *msg,
-			   struct strbuf *err)
+			   const char *refname, const struct object_id *new_oid,
+			   const char *new_target, unsigned int flags,
+			   const char *msg, struct strbuf *err)
 {
 	if (new_oid && new_target)
 		BUG("create called with both new_oid and new_target set");
diff --git a/refs.h b/refs.h
index a5bedf48cf6de91005a7e8d0bf58ca98350397a6..b86d2cd87be33f7bb1b31fce711d6c7c8d9491c9 100644
--- a/refs.h
+++ b/refs.h
@@ -727,6 +727,18 @@ int ref_transaction_update(struct ref_transaction *transaction,
 			   unsigned int flags, const char *msg,
 			   struct strbuf *err);
 
+/*
+ * Similar to `ref_transaction_update`, but this function is only for adding
+ * a reflog updates. Supports providing custom committer information.
+ */
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err);
+
 /*
  * Add a reference creation to transaction. new_oid is the value that
  * the reference should have after the update; it must not be
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9c929c1ac33bc62a75620e684a809d46b574f1c6..32975e0fd7a03ab8ddf99c0a68af99921d3f5090 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1268,10 +1268,10 @@ static void prune_ref(struct files_ref_store *refs, struct ref_to_prune *r)
 	transaction = ref_store_transaction_begin(&refs->base, 0, &err);
 	if (!transaction)
 		goto cleanup;
-	ref_transaction_add_update(
-			transaction, r->name,
-			REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | REF_IS_PRUNING,
-			null_oid(), &r->oid, NULL, NULL, NULL);
+	ref_transaction_add_update(transaction, r->name,
+				   REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD |
+				   REF_IS_PRUNING, null_oid(), &r->oid, NULL,
+				   NULL, NULL, NULL);
 	if (ref_transaction_commit(transaction, &err))
 		goto cleanup;
 
@@ -2418,7 +2418,7 @@ static int split_head_update(struct ref_update *update,
 			transaction, "HEAD",
 			update->flags | REF_LOG_ONLY | REF_NO_DEREF,
 			&update->new_oid, &update->old_oid,
-			NULL, NULL, update->msg);
+			NULL, NULL, update->committer_info, update->msg);
 
 	/*
 	 * Add "HEAD". This insertion is O(N) in the transaction
@@ -2482,7 +2482,8 @@ static int split_symref_update(struct ref_update *update,
 			transaction, referent, new_flags,
 			update->new_target ? NULL : &update->new_oid,
 			update->old_target ? NULL : &update->old_oid,
-			update->new_target, update->old_target, update->msg);
+			update->new_target, update->old_target, NULL,
+			update->msg);
 
 	new_update->parent_update = update;
 
@@ -2911,11 +2912,11 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 					packed_transaction;
 			}
 
-			ref_transaction_add_update(
-					packed_transaction, update->refname,
-					REF_HAVE_NEW | REF_NO_DEREF,
-					&update->new_oid, NULL,
-					NULL, NULL, NULL);
+			ref_transaction_add_update(packed_transaction,
+						   update->refname,
+						   REF_HAVE_NEW | REF_NO_DEREF,
+						   &update->new_oid, NULL, NULL,
+						   NULL, NULL, NULL);
 		}
 	}
 
@@ -3080,10 +3081,12 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 
 		/*
-		 * packed-refs don't support symbolic refs and root refs, so we
-		 * have to queue these references via the loose transaction.
+		 * packed-refs don't support symbolic refs, root refs and reflogs,
+		 * so we have to queue these references via the loose transaction.
 		 */
-		if (update->new_target || is_root_ref(update->refname)) {
+		if (update->new_target ||
+		    is_root_ref(update->refname) ||
+		    (update->flags & REF_LOG_ONLY)) {
 			if (!loose_transaction) {
 				loose_transaction = ref_store_transaction_begin(&refs->base, 0, err);
 				if (!loose_transaction) {
@@ -3092,15 +3095,22 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 				}
 			}
 
-			ref_transaction_add_update(loose_transaction, update->refname,
-						   update->flags & ~REF_HAVE_OLD,
-						   update->new_target ? NULL : &update->new_oid, NULL,
-						   update->new_target, NULL, NULL);
+			if (update->flags & REF_LOG_ONLY)
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags, &update->new_oid,
+							   &update->old_oid, NULL, NULL,
+							   update->committer_info, update->msg);
+			else
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags & ~REF_HAVE_OLD,
+							   update->new_target ? NULL : &update->new_oid, NULL,
+							   update->new_target, NULL, update->committer_info,
+							   NULL);
 		} else {
 			ref_transaction_add_update(packed_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,
 						   &update->new_oid, &update->old_oid,
-						   NULL, NULL, NULL);
+						   NULL, NULL, update->committer_info, NULL);
 		}
 	}
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f5c733d099f0c6f1076a25f4f77d9d5eb345ec87..82c1387d1e6ab3658b31fe99c95f98645ff1ebf1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -156,13 +156,15 @@ int ref_update_reject_duplicates(struct string_list *refnames,
  * dereferenced if the REF_HAVE_NEW and REF_HAVE_OLD bits,
  * respectively, are set in flags.
  */
-struct ref_update *ref_transaction_add_update(
-		struct ref_transaction *transaction,
-		const char *refname, unsigned int flags,
-		const struct object_id *new_oid,
-		const struct object_id *old_oid,
-		const char *new_target, const char *old_target,
-		const char *msg);
+struct ref_update *ref_transaction_add_update(struct ref_transaction *transaction,
+					      const char *refname,
+					      unsigned int flags,
+					      const struct object_id *new_oid,
+					      const struct object_id *old_oid,
+					      const char *new_target,
+					      const char *old_target,
+					      const char *committer_info,
+					      const char *msg);
 
 /*
  * Transaction states.
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c008f20be719fec3af6a8f81c821cb9c263764d7..b2e3ba877de9e59fea5a4d066eb13e60ef22a32b 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1078,7 +1078,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			new_update = ref_transaction_add_update(
 					transaction, "HEAD",
 					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
-					&u->new_oid, &u->old_oid, NULL, NULL, u->msg);
+					&u->new_oid, &u->old_oid, NULL, NULL, NULL,
+					u->msg);
 			string_list_insert(&affected_refnames, new_update->refname);
 		}
 
@@ -1161,7 +1162,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 					transaction, referent.buf, new_flags,
 					u->new_target ? NULL : &u->new_oid,
 					u->old_target ? NULL : &u->old_oid,
-					u->new_target, u->old_target, u->msg);
+					u->new_target, u->old_target,
+					u->committer_info, u->msg);
 
 				new_update->parent_update = u;
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 6/7] refs: allow multiple reflog entries for the same refname
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
                   ` (4 preceding siblings ...)
  2024-12-09 11:07 ` [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
@ 2024-12-09 11:07 ` Karthik Nayak
  2024-12-11 10:44   ` Christian Couder
  2024-12-11 14:26   ` Patrick Steinhardt
  2024-12-09 11:07 ` [PATCH 7/7] refs: add support for migrating reflogs Karthik Nayak
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

The reference transaction only allows a update for a given reference to
avoid conflicts. This, however, isn't an issue for reflogs. There are no
conflicts to be resolved in reflogs and when migrating reflogs between
backends we'd have multiple reflog entries for the same refname.

So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.

This is required to add reflog migration support to `git refs migrate`
which currently doesn't support it.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c    | 15 +++++++++++----
 refs/reftable-backend.c | 16 +++++++++++++---
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 32975e0fd7a03ab8ddf99c0a68af99921d3f5090..10fba1e97b967fbc04c62a0a6d7d9648ce1c51fb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2612,6 +2612,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
+	if (update->flags & REF_LOG_ONLY)
+		goto out;
+
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
@@ -2830,13 +2833,16 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	 */
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *update = transaction->updates[i];
-		struct string_list_item *item =
-			string_list_append(&affected_refnames, update->refname);
+		struct string_list_item *item;
 
 		if ((update->flags & REF_IS_PRUNING) &&
 		    !(update->flags & REF_NO_DEREF))
 			BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
+		if (update->flags & REF_LOG_ONLY)
+			continue;
+
+		item = string_list_append(&affected_refnames, update->refname);
 		/*
 		 * We store a pointer to update in item->util, but at
 		 * the moment we never use the value of this field
@@ -3036,8 +3042,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 
 	/* Fail if a refname appears more than once in the transaction: */
 	for (i = 0; i < transaction->nr; i++)
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	string_list_sort(&affected_refnames);
 	if (ref_update_reject_duplicates(&affected_refnames, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..d9d2e28122a00ddd7f835c35a5851e390761885b 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		if (ret)
 			goto done;
 
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	}
 
 	/*
@@ -1302,6 +1303,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
 	const char *committer_info;
+	struct strintmap logs_ts;
 	int ret = 0;
 
 	committer_info = git_committer_info(0);
@@ -1310,6 +1312,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 
 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
 
+	strintmap_init(&logs_ts, ts);
+
 	reftable_writer_set_limits(writer, ts, ts);
 
 	for (i = 0; i < arg->updates_nr; i++) {
@@ -1391,6 +1395,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 
 			if (create_reflog) {
 				struct ident_split c;
+				uint64_t update_index;
 
 				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
 				log = &logs[logs_nr++];
@@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				}
 
 				fill_reftable_log_record(log, &c);
-				log->update_index = ts;
+
+				update_index = strintmap_get(&logs_ts, u->refname);
+				log->update_index = update_index;
+				strintmap_set(&logs_ts, u->refname, update_index+1);
+
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1476,6 +1485,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 
 done:
 	assert(ret != REFTABLE_API_ERROR);
+	strintmap_clear(&logs_ts);
 	for (i = 0; i < logs_nr; i++)
 		reftable_log_record_release(&logs[i]);
 	free(logs);

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 7/7] refs: add support for migrating reflogs
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
                   ` (5 preceding siblings ...)
  2024-12-09 11:07 ` [PATCH 6/7] refs: allow multiple reflog entries for the same refname Karthik Nayak
@ 2024-12-09 11:07 ` Karthik Nayak
  2024-12-11 14:26   ` Patrick Steinhardt
  2024-12-10 12:13 ` [PATCH 0/7] refs: add reflog support to `git refs migrate` Junio C Hamano
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
  8 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-09 11:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.

To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-refs.txt |  2 --
 refs.c                     | 81 ++++++++++++++++++++++++++++++++--------------
 t/t1460-refs-migrate.sh    | 73 ++++++++++++++++++++++++++++-------------
 3 files changed, 107 insertions(+), 49 deletions(-)

diff --git a/Documentation/git-refs.txt b/Documentation/git-refs.txt
index ce31f93061db5e5d16aca516dd3d15f6527db870..9829984b0a4c4f54ec7f9b6c6c7072f62b1d198d 100644
--- a/Documentation/git-refs.txt
+++ b/Documentation/git-refs.txt
@@ -57,8 +57,6 @@ KNOWN LIMITATIONS
 
 The ref format migration has several known limitations in its current form:
 
-* It is not possible to migrate repositories that have reflogs.
-
 * It is not possible to migrate repositories that have worktrees.
 
 * There is no way to block concurrent writes to the repository during an
diff --git a/refs.c b/refs.c
index 602a65873181a90751def525608a7fa7bea59562..4d10c7276391e8e85c66bd626bb0ecfec0941c6d 100644
--- a/refs.c
+++ b/refs.c
@@ -30,6 +30,7 @@
 #include "date.h"
 #include "commit.h"
 #include "wildmatch.h"
+#include "ident.h"
 
 /*
  * List of all available backends
@@ -2687,6 +2688,7 @@ int ref_update_check_old_target(const char *referent, struct ref_update *update,
 }
 
 struct migration_data {
+	unsigned int index;
 	struct ref_store *old_refs;
 	struct ref_transaction *transaction;
 	struct strbuf *errbuf;
@@ -2722,6 +2724,53 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
 	return ret;
 }
 
+struct reflog_migration_data {
+	unsigned int *index;
+	const char *refname;
+	struct ref_store *old_refs;
+	struct ref_transaction *transaction;
+	struct strbuf *errbuf;
+};
+
+static int migrate_one_reflog_entry(struct object_id *old_oid,
+				    struct object_id *new_oid,
+				    const char *committer,
+				    timestamp_t timestamp, int tz,
+				    const char *msg, void *cb_data)
+{
+	struct reflog_migration_data *data = cb_data;
+	struct strbuf sb = STRBUF_INIT;
+	const char *date;
+	int ret;
+
+	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
+	/* committer contains name and email */
+	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
+
+	ret = ref_transaction_update_reflog(data->transaction, data->refname,
+					    new_oid, old_oid, sb.buf,
+					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
+					    (*data->index)++, data->errbuf);
+	strbuf_release(&sb);
+
+	return ret;
+}
+
+static int migrate_one_reflog(const char *refname, void *cb_data)
+{
+	struct migration_data *migration_data = cb_data;
+	struct reflog_migration_data data;
+
+	data.refname = refname;
+	data.old_refs = migration_data->old_refs;
+	data.transaction = migration_data->transaction;
+	data.errbuf = migration_data->errbuf;
+	data.index = &migration_data->index;
+
+	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
+					migrate_one_reflog_entry, &data);
+}
+
 static int move_files(const char *from_path, const char *to_path, struct strbuf *errbuf)
 {
 	struct strbuf from_buf = STRBUF_INIT, to_buf = STRBUF_INIT;
@@ -2788,13 +2837,6 @@ static int move_files(const char *from_path, const char *to_path, struct strbuf
 	return ret;
 }
 
-static int count_reflogs(const char *reflog UNUSED, void *payload)
-{
-	size_t *reflog_count = payload;
-	(*reflog_count)++;
-	return 0;
-}
-
 static int has_worktrees(void)
 {
 	struct worktree **worktrees = get_worktrees();
@@ -2820,7 +2862,6 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	struct ref_transaction *transaction = NULL;
 	struct strbuf new_gitdir = STRBUF_INIT;
 	struct migration_data data;
-	size_t reflog_count = 0;
 	int did_migrate_refs = 0;
 	int ret;
 
@@ -2832,21 +2873,6 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 
 	old_refs = get_main_ref_store(repo);
 
-	/*
-	 * We do not have any interfaces that would allow us to write many
-	 * reflog entries. Once we have them we can remove this restriction.
-	 */
-	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
-		strbuf_addstr(errbuf, "cannot count reflogs");
-		ret = -1;
-		goto done;
-	}
-	if (reflog_count) {
-		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
-		ret = -1;
-		goto done;
-	}
-
 	/*
 	 * Worktrees complicate the migration because every worktree has a
 	 * separate ref storage. While it should be feasible to implement, this
@@ -2868,8 +2894,8 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	 *   1. Set up a new temporary directory and initialize it with the new
 	 *      format. This is where all refs will be migrated into.
 	 *
-	 *   2. Enumerate all refs and write them into the new ref storage.
-	 *      This operation is safe as we do not yet modify the main
+	 *   2. Enumerate all refs and reflogs and write them into the new ref
+	 *      storage. This operation is safe as we do not yet modify the main
 	 *      repository.
 	 *
 	 *   3. If we're in dry-run mode then we are done and can hand over the
@@ -2924,6 +2950,11 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	if (ret < 0)
 		goto done;
 
+	data.index = 1;
+	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
+	if (ret < 0)
+		goto done;
+
 	ret = ref_transaction_commit(transaction, errbuf);
 	if (ret < 0)
 		goto done;
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index 1bfff3a7afd5acc470424dfe7ec3e97d45f5c481..f59bc4860f19c4af82dc6f2984bdb69d61fe3ec2 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -7,23 +7,44 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
 
+# Migrate the provided repository from one format to the other and
+# verify that the references and logs are migrated over correctly.
+# Usage: test_migration <repo> <format> <skip_reflog_verify>
+#   <repo> is the relative path to the repo to be migrated.
+#   <format> is the ref format to be migrated to.
+#   <skip_reflog_verify> (true or false) whether to skip reflog verification.
 test_migration () {
-	git -C "$1" for-each-ref --include-root-refs \
+	repo=$1 &&
+	format=$2 &&
+	skip_reflog_verify=${3:-false} &&
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >expect &&
-	git -C "$1" refs migrate --ref-format="$2" &&
-	git -C "$1" for-each-ref --include-root-refs \
+	if ! $skip_reflog_verify
+	then
+	   git -C "$repo" reflog --all >expect_logs &&
+	   git -C "$repo" reflog list >expect_log_list
+	fi &&
+
+	git -C "$repo" refs migrate --ref-format="$2" &&
+
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >actual &&
 	test_cmp expect actual &&
+	if ! $skip_reflog_verify
+	then
+		git -C "$repo" reflog --all >actual_logs &&
+		git -C "$repo" reflog list >actual_log_list &&
+		test_cmp expect_logs actual_logs &&
+		test_cmp expect_log_list actual_log_list
+	fi &&
 
-	git -C "$1" rev-parse --show-ref-format >actual &&
-	echo "$2" >expect &&
+	git -C "$repo" rev-parse --show-ref-format >actual &&
+	echo "$format" >expect &&
 	test_cmp expect actual
 }
 
 test_expect_success 'setup' '
-	rm -rf .git &&
-	# The migration does not yet support reflogs.
-	git config --global core.logAllRefUpdates false
+	rm -rf .git
 '
 
 test_expect_success "superfluous arguments" '
@@ -78,19 +99,6 @@ do
 			test_cmp expect err
 		'
 
-		test_expect_success "$from_format -> $to_format: migration with reflog fails" '
-			test_when_finished "rm -rf repo" &&
-			git init --ref-format=$from_format repo &&
-			test_config -C repo core.logAllRefUpdates true &&
-			test_commit -C repo logged &&
-			test_must_fail git -C repo refs migrate \
-				--ref-format=$to_format 2>err &&
-			cat >expect <<-EOF &&
-			error: migrating reflogs is not supported yet
-			EOF
-			test_cmp expect err
-		'
-
 		test_expect_success "$from_format -> $to_format: migration with worktree fails" '
 			test_when_finished "rm -rf repo" &&
 			git init --ref-format=$from_format repo &&
@@ -141,7 +149,7 @@ do
 			test_commit -C repo initial &&
 			test-tool -C repo ref-store main update-ref "" refs/heads/broken \
 				"$(test_oid 001)" "$ZERO_OID" REF_SKIP_CREATE_REFLOG,REF_SKIP_OID_VERIFICATION &&
-			test_migration repo "$to_format" &&
+			test_migration repo "$to_format" true &&
 			test_oid 001 >expect &&
 			git -C repo rev-parse refs/heads/broken >actual &&
 			test_cmp expect actual
@@ -195,6 +203,27 @@ do
 			git -C repo rev-parse --show-ref-format >actual &&
 			test_cmp expect actual
 		'
+
+		test_expect_success "$from_format -> $to_format: reflogs of symrefs with target deleted" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit -C repo initial &&
+			git -C repo branch branch-1 HEAD &&
+			git -C repo symbolic-ref refs/heads/symref refs/heads/branch-1 &&
+			cat >input <<-EOF &&
+			delete refs/heads/branch-1
+			EOF
+			git -C repo update-ref --stdin <input &&
+			test_migration repo "$to_format"
+		'
+
+		test_expect_success "$from_format -> $to_format: reflogs order is retained" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit --date "100005000 +0700" --no-tag -C repo initial &&
+			test_commit --date "100003000 +0700" --no-tag -C repo second &&
+			test_migration repo "$to_format"
+		'
 	done
 done
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 0/7] refs: add reflog support to `git refs migrate`
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
                   ` (6 preceding siblings ...)
  2024-12-09 11:07 ` [PATCH 7/7] refs: add support for migrating reflogs Karthik Nayak
@ 2024-12-10 12:13 ` Junio C Hamano
  2024-12-10 17:42   ` karthik nayak
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
  8 siblings, 1 reply; 93+ messages in thread
From: Junio C Hamano @ 2024-12-10 12:13 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> The `git refs migrate` command was introduced in
> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
> 2024-06-06) to support migrating from one reference backend to another.

This topic pass the tests standalone for me locally, but seems to
fail 1460.17 and 1460.31 when merged to 'seen'.  I'll push out the
integration result tonight; it would be very much appreciated if you
can help find if there are semantic (or otherwise) mismerges that
are causing this breakage.

Thanks.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 1/7] refs: include committer info in `ref_update` struct
  2024-12-09 11:07 ` [PATCH 1/7] refs: include committer info in `ref_update` struct Karthik Nayak
@ 2024-12-10 16:51   ` Christian Couder
  2024-12-11 10:13     ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Christian Couder @ 2024-12-10 16:51 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 9, 2024 at 12:10 PM Karthik Nayak <karthik.188@gmail.com> wrote:


> If there is no `committer_info`
> provided, the reference backends default to using
> `git_committer_info(0)`. The field itself cannot be set to
> `git_committer_info(0)` since the values are dynamic and must be
> obtained right when the reflog is being committed.


> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 64f51f0da905a9a8a1ac4109c6b0a9a85a355db7..13f8539e6caa923cd4834775fcb0cd7f90d82014 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1858,6 +1858,9 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
>         struct strbuf sb = STRBUF_INIT;
>         int ret = 0;
>
> +       if (!committer)
> +               committer = git_committer_info(0);

It looks like this is where we obtain the value "right when the reflog
is being committed".

> +
>         strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
>         if (msg && *msg) {
>                 strbuf_addch(&sb, '\t');
> @@ -1871,8 +1874,10 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
>  }
>
>  static int files_log_ref_write(struct files_ref_store *refs,
> -                              const char *refname, const struct object_id *old_oid,
> -                              const struct object_id *new_oid, const char *msg,
> +                              const char *refname,
> +                              const struct object_id *old_oid,
> +                              const struct object_id *new_oid,
> +                              const char *committer_info, const char *msg,
>                                int flags, struct strbuf *err)
>  {
>         int logfd, result;
> @@ -1889,8 +1894,7 @@ static int files_log_ref_write(struct files_ref_store *refs,
>
>         if (logfd < 0)
>                 return 0;
> -       result = log_ref_write_fd(logfd, old_oid, new_oid,
> -                                 git_committer_info(0), msg);
> +       result = log_ref_write_fd(logfd, old_oid, new_oid, committer_info, msg);

Here we just pass the committer_info to the above function.

>         if (result) {
>                 struct strbuf sb = STRBUF_INIT;
>                 int save_errno = errno;
> @@ -1974,8 +1978,7 @@ static int commit_ref_update(struct files_ref_store *refs,
>         files_assert_main_repository(refs, "commit_ref_update");
>
>         clear_loose_ref_cache(refs);
> -       if (files_log_ref_write(refs, lock->ref_name,
> -                               &lock->old_oid, oid,
> +       if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid, oid, NULL,
>                                 logmsg, flags, err)) {

Here we don't have the info so we pass NULL.

>                 char *old_msg = strbuf_detach(err, NULL);
>                 strbuf_addf(err, "cannot update the ref '%s': %s",
> @@ -2007,8 +2010,8 @@ static int commit_ref_update(struct files_ref_store *refs,
>                 if (head_ref && (head_flag & REF_ISSYMREF) &&
>                     !strcmp(head_ref, lock->ref_name)) {
>                         struct strbuf log_err = STRBUF_INIT;
> -                       if (files_log_ref_write(refs, "HEAD",
> -                                               &lock->old_oid, oid,
> +                       if (files_log_ref_write(refs, "HEAD", &lock->old_oid,
> +                                               oid, git_committer_info(0),

Here we don't have the info either, so I think we should also pass
NULL. It would then be computed "right when the reflog is being
committed" in the above function. No?

>                                                 logmsg, flags, &log_err)) {
>                                 error("%s", log_err.buf);
>                                 strbuf_release(&log_err);


> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 647ef9b05b1dc9a376ed054330b487f7595c5caa..e882602487c66261d586a94101bb1b4e9a2ed60e 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1379,11 +1379,21 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data

It is not your fault but write_transaction_table() does the following
right at the beginning of the function:

       committer_info = git_committer_info(0);
       if (split_ident_line(&committer_ident, committer_info,
strlen(committer_info)))
               BUG("failed splitting committer info");

but then 'committer_ident' is only used in the hunk you are changing:

>                         if (create_reflog) {
> +                               struct ident_split c;
> +
>                                 ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
>                                 log = &logs[logs_nr++];
>                                 memset(log, 0, sizeof(*log));
>
> -                               fill_reftable_log_record(log, &committer_ident);
> +                               if (u->committer_info) {
> +                                       if (split_ident_line(&c, u->committer_info,
> +                                                            strlen(u->committer_info)))
> +                                               BUG("failed splitting committer info");
> +                               } else {

I would think it would be more efficient to only compute
'committer_ident' here, right before we use it if needed. Or is there
something I am missing?

> +                                       c = committer_ident;
> +                               }
> +
> +                               fill_reftable_log_record(log, &c);
>                                 log->update_index = ts;
>                                 log->refname = xstrdup(u->refname);
>                                 memcpy(log->value.update.new_hash,

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 3/7] refs/files: add count field to ref_lock
  2024-12-09 11:07 ` [PATCH 3/7] refs/files: add count field to ref_lock Karthik Nayak
@ 2024-12-10 17:22   ` Christian Couder
  2024-12-11 10:18     ` karthik nayak
  2024-12-11  9:05   ` Christian Couder
  1 sibling, 1 reply; 93+ messages in thread
From: Christian Couder @ 2024-12-10 17:22 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> When refs are updated in the files-backend, a lock is obtained for the
> corresponding file path. This is the case even for reflogs, i.e. a lock
> is obtained on the reference path instead of the reflog path. This
> works, since generally, reflogs are updated alongside the ref.
>
> The upcoming patches will add support for reflog updates in ref
> transaction. This means, in a particular transaction we want to have ref
> updates and reflog updates. For refs, in a given transaction there can
> only be one update. But, we can theoretically have multiple reflog
> updates in a given transaction.

Nit: Giving an example might help understand where multiple reflog
updates can happen in a given transaction. Alternatively pointing to
an existing doc that contains such an example or explanations might
help too.

> @@ -2572,18 +2588,25 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>                         goto out;
>         }
>
> -       ret = lock_raw_ref(refs, update->refname, mustexist,
> -                          affected_refnames,
> -                          &lock, &referent,
> -                          &update->type, err);
> -       if (ret) {
> -               char *reason;
> +       lock = strmap_get(&backend_data->ref_locks, update->refname);
> +       if (lock) {
> +               lock->count = lock->count + 1;

Nit:
              lock->count++;

> +       } else {
> +               ret = lock_raw_ref(refs, update->refname, mustexist,
> +                                  affected_refnames,
> +                                  &lock, &referent,
> +                                  &update->type, err);
> +               if (ret) {
> +                       char *reason;
> +
> +                       reason = strbuf_detach(err, NULL);
> +                       strbuf_addf(err, "cannot lock ref '%s': %s",
> +                                   ref_update_original_update_refname(update), reason);
> +                       free(reason);
> +                       goto out;
> +               }
>
> -               reason = strbuf_detach(err, NULL);
> -               strbuf_addf(err, "cannot lock ref '%s': %s",
> -                           ref_update_original_update_refname(update), reason);
> -               free(reason);
> -               goto out;
> +               strmap_put(&backend_data->ref_locks, update->refname, lock);
>         }
>
>         update->backend_data = lock;

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 0/7] refs: add reflog support to `git refs migrate`
  2024-12-10 12:13 ` [PATCH 0/7] refs: add reflog support to `git refs migrate` Junio C Hamano
@ 2024-12-10 17:42   ` karthik nayak
  2024-12-10 18:03     ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: karthik nayak @ 2024-12-10 17:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The `git refs migrate` command was introduced in
>> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
>> 2024-06-06) to support migrating from one reference backend to another.
>
> This topic pass the tests standalone for me locally, but seems to
> fail 1460.17 and 1460.31 when merged to 'seen'.  I'll push out the
> integration result tonight; it would be very much appreciated if you
> can help find if there are semantic (or otherwise) mismerges that
> are causing this breakage.
>

I see. I can reproduce it on 'seen' as you mentioned. Will debug and get
back to you on this. Thanks for letting me know.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 0/7] refs: add reflog support to `git refs migrate`
  2024-12-10 17:42   ` karthik nayak
@ 2024-12-10 18:03     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-10 18:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2341 bytes --]

karthik nayak <karthik.188@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Karthik Nayak <karthik.188@gmail.com> writes:
>>
>>> The `git refs migrate` command was introduced in
>>> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
>>> 2024-06-06) to support migrating from one reference backend to another.
>>
>> This topic pass the tests standalone for me locally, but seems to
>> fail 1460.17 and 1460.31 when merged to 'seen'.  I'll push out the
>> integration result tonight; it would be very much appreciated if you
>> can help find if there are semantic (or otherwise) mismerges that
>> are causing this breakage.
>>
>
> I see. I can reproduce it on 'seen' as you mentioned. Will debug and get
> back to you on this. Thanks for letting me know.

Seems like this is due to 'kn/reftable-writer-log-write-verify', which I
should have totally seen coming. A quick fix like the one below fixes
the issue. I'll merge in 'kn/reftable-writer-log-write-verify' when I
re-roll.

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1badf88df0..5c51a6a226 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1428,6 +1428,7 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
+	uint64_t max_update_index = ts;
 	const char *committer_info;
 	struct strintmap logs_ts;
 	int ret = 0;
@@ -1541,6 +1542,13 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 				log->update_index = update_index;
 				strintmap_set(&logs_ts, u->refname, update_index+1);

+				/*
+				 * Note the max_update_index, so we can reset the limit
+				 * before actually writing the logs.
+				 */
+				if (update_index > max_update_index)
+					max_update_index = update_index;
+
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1604,6 +1612,8 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
+		reftable_writer_set_limits(writer, ts, max_update_index);
+
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 3/7] refs/files: add count field to ref_lock
  2024-12-09 11:07 ` [PATCH 3/7] refs/files: add count field to ref_lock Karthik Nayak
  2024-12-10 17:22   ` Christian Couder
@ 2024-12-11  9:05   ` Christian Couder
  2024-12-11 10:26     ` karthik nayak
  1 sibling, 1 reply; 93+ messages in thread
From: Christian Couder @ 2024-12-11  9:05 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> When refs are updated in the files-backend, a lock is obtained for the
> corresponding file path. This is the case even for reflogs, i.e. a lock
> is obtained on the reference path instead of the reflog path. This
> works, since generally, reflogs are updated alongside the ref.
>
> The upcoming patches will add support for reflog updates in ref
> transaction. This means, in a particular transaction we want to have ref
> updates and reflog updates. For refs, in a given transaction there can
> only be one update.

Maybe something like: "For a given ref in a given transaction there
can be at most one update."

> But, we can theoretically have multiple reflog
> updates in a given transaction.

And: "But we can theoretically have multiple reflog updates for a
given ref in a given transaction."

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 13f8539e6caa923cd4834775fcb0cd7f90d82014..9c929c1ac33bc62a75620e684a809d46b574f1c6 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -71,6 +71,8 @@ struct ref_lock {
>         char *ref_name;
>         struct lock_file lk;
>         struct object_id old_oid;
> +       /* count keeps track of users of the lock */
> +       unsigned int count;

Nit: maybe the following is a bit better:

      unsigned int count; /* track users of the lock (ref update +
reflog updates) */

>  };

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 4/7] refs: extract out refname verification in transactions
  2024-12-09 11:07 ` [PATCH 4/7] refs: extract out refname verification in transactions Karthik Nayak
@ 2024-12-11  9:26   ` Christian Couder
  2024-12-11 10:31     ` karthik nayak
  2024-12-11 14:26     ` Patrick Steinhardt
  0 siblings, 2 replies; 93+ messages in thread
From: Christian Couder @ 2024-12-11  9:26 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
> the refname of the update is verified for:
>
>   - Ensuring it is not a pseudoref.
>   - Checking the refname format.
>
> These checks are also be needed in a following commit where the function

s/are also be needed/will also be needed/

> to add reflog updates to the transaction is introduced. Extract the code
> out into a new static function.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c | 43 ++++++++++++++++++++++++++++---------------
>  1 file changed, 28 insertions(+), 15 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..732c236a3fd0cf324cc172b48d3d54f6dbadf4a4 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1196,6 +1196,29 @@ struct ref_update *ref_transaction_add_update(
>         return update;
>  }
>
> +static int transaction_refname_verification(const char *refname,
> +                                           const struct object_id *new_oid,
> +                                           unsigned int flags,
> +                                           struct strbuf *err)

We have a number of functions named 'xxx_valid()' or 'xxx_ok()' while
I couldn't find any 'yyy_verification()' function, so it might be
better to name it 'transaction_refname_valid()' or maybe
'transaction_refname_ok()'.

Also I think it should probably return a bool so 1 if the refname is
valid and 0 otherwise, unless we have plans in the future to follow
different code paths depending on the different ways it is not valid.

> +       ret = transaction_refname_verification(refname, new_oid, flags, err);
> +       if (ret)
> +               return ret;

Then the above could be just:

       if (!transaction_refname_valid(refname, new_oid, flags, err))
               return -1;

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-09 11:07 ` [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
@ 2024-12-11 10:10   ` Christian Couder
  2024-12-11 18:06     ` karthik nayak
  2024-12-11 14:26   ` Patrick Steinhardt
  1 sibling, 1 reply; 93+ messages in thread
From: Christian Couder @ 2024-12-11 10:10 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> Introduce a new function `ref_transaction_update_reflog`, for clients to
> add a reflog update to a transaction. While the existing function
> `ref_transaction_update` also allows clients to add a reflog entry, this
> function does a few things more, It:
>   - Enforces that only a reflog entry is added and does not update the
>   ref itself.
>   - Allows the users to also provide the committer information. This
>   means clients can add reflog entries with custom committer
>   information.



> A follow up commit will utilize this function to add reflog support to
> `git refs migrate`.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c                  | 89 +++++++++++++++++++++++++++++++++++++------------
>  refs.h                  | 12 +++++++
>  refs/files-backend.c    | 48 +++++++++++++++-----------
>  refs/refs-internal.h    | 16 +++++----
>  refs/reftable-backend.c |  6 ++--
>  5 files changed, 122 insertions(+), 49 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 732c236a3fd0cf324cc172b48d3d54f6dbadf4a4..602a65873181a90751def525608a7fa7bea59562 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1160,13 +1160,15 @@ void ref_transaction_free(struct ref_transaction *transaction)
>         free(transaction);
>  }
>
> -struct ref_update *ref_transaction_add_update(
> -               struct ref_transaction *transaction,
> -               const char *refname, unsigned int flags,
> -               const struct object_id *new_oid,
> -               const struct object_id *old_oid,
> -               const char *new_target, const char *old_target,
> -               const char *msg)
> +struct ref_update *ref_transaction_add_update(struct ref_transaction *transaction,
> +                                             const char *refname,
> +                                             unsigned int flags,
> +                                             const struct object_id *new_oid,
> +                                             const struct object_id *old_oid,
> +                                             const char *new_target,
> +                                             const char *old_target,
> +                                             const char *committer_info,

This change (adding a 'const char *committer_info' argument to
ref_transaction_add_update()) is not described in the commit message
and it requires a number of changes to the callers of this function,
so I think it might want to be in its own preparatory commit before
this one.

> +                                             const char *msg)
>  {
>         struct ref_update *update;
>
> @@ -1190,8 +1192,15 @@ struct ref_update *ref_transaction_add_update(
>                 oidcpy(&update->new_oid, new_oid);
>         if ((flags & REF_HAVE_OLD) && old_oid)
>                 oidcpy(&update->old_oid, old_oid);
> -       if (!(flags & REF_SKIP_CREATE_REFLOG))
> +       if (!(flags & REF_SKIP_CREATE_REFLOG)) {
> +               if (committer_info) {
> +                       struct strbuf sb = STRBUF_INIT;
> +                       strbuf_addstr(&sb, committer_info);
> +                       update->committer_info = strbuf_detach(&sb, NULL);

Maybe:
                      update->committer_info = xstrdup(committer_info);

> +               }
> +
>                 update->msg = normalize_reflog_message(msg);
> +       }
>
>         return update;
>  }
> @@ -1199,20 +1208,29 @@ struct ref_update *ref_transaction_add_update(
>  static int transaction_refname_verification(const char *refname,
>                                             const struct object_id *new_oid,
>                                             unsigned int flags,
> +                                           unsigned int reflog,
>                                             struct strbuf *err)
>  {
>         if (flags & REF_SKIP_REFNAME_VERIFICATION)
>                 return 0;
>
>         if (is_pseudo_ref(refname)) {
> -               strbuf_addf(err, _("refusing to update pseudoref '%s'"),
> -                           refname);
> +               if (reflog)
> +                       strbuf_addf(err, _("refusing to update reflog for pseudoref '%s'"),
> +                                   refname);
> +               else
> +                       strbuf_addf(err, _("refusing to update pseudoref '%s'"),
> +                                   refname);

Maybe:

              const char *what = reflog ? "reflog for pseudoref" : "pseudoref";
              strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);

>                 return -1;
>         } else if ((new_oid && !is_null_oid(new_oid)) ?
>                  check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
>                  !refname_is_safe(refname)) {
> -               strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
> -                           refname);
> +               if (reflog)
> +                       strbuf_addf(err, _("refusing to update reflog with bad name '%s'"),
> +                                   refname);
> +               else
> +                       strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
> +                                   refname);

Maybe:

              const char *what = reflog ? "reflog with bad name" :
"ref with bad name";
              strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);

>                 return -1;
>         }

[...]

>  int ref_transaction_create(struct ref_transaction *transaction,
> -                          const char *refname,
> -                          const struct object_id *new_oid,
> -                          const char *new_target,
> -                          unsigned int flags, const char *msg,
> -                          struct strbuf *err)
> +                          const char *refname, const struct object_id *new_oid,
> +                          const char *new_target, unsigned int flags,
> +                          const char *msg, struct strbuf *err)

This looks like a wrapping or indenting only change. If you really
want to do it, it should probably be in its own preparatory commit.

> index a5bedf48cf6de91005a7e8d0bf58ca98350397a6..b86d2cd87be33f7bb1b31fce711d6c7c8d9491c9 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -727,6 +727,18 @@ int ref_transaction_update(struct ref_transaction *transaction,
>                            unsigned int flags, const char *msg,
>                            struct strbuf *err);
>
> +/*
> + * Similar to `ref_transaction_update`, but this function is only for adding
> + * a reflog updates.

"a reflog update" or "reflog updates".

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 1/7] refs: include committer info in `ref_update` struct
  2024-12-10 16:51   ` Christian Couder
@ 2024-12-11 10:13     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-11 10:13 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 6403 bytes --]

Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Dec 9, 2024 at 12:10 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
>
>> If there is no `committer_info`
>> provided, the reference backends default to using
>> `git_committer_info(0)`. The field itself cannot be set to
>> `git_committer_info(0)` since the values are dynamic and must be
>> obtained right when the reflog is being committed.
>
>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index 64f51f0da905a9a8a1ac4109c6b0a9a85a355db7..13f8539e6caa923cd4834775fcb0cd7f90d82014 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -1858,6 +1858,9 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
>>         struct strbuf sb = STRBUF_INIT;
>>         int ret = 0;
>>
>> +       if (!committer)
>> +               committer = git_committer_info(0);
>
> It looks like this is where we obtain the value "right when the reflog
> is being committed".
>
>> +
>>         strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
>>         if (msg && *msg) {
>>                 strbuf_addch(&sb, '\t');
>> @@ -1871,8 +1874,10 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
>>  }
>>
>>  static int files_log_ref_write(struct files_ref_store *refs,
>> -                              const char *refname, const struct object_id *old_oid,
>> -                              const struct object_id *new_oid, const char *msg,
>> +                              const char *refname,
>> +                              const struct object_id *old_oid,
>> +                              const struct object_id *new_oid,
>> +                              const char *committer_info, const char *msg,
>>                                int flags, struct strbuf *err)
>>  {
>>         int logfd, result;
>> @@ -1889,8 +1894,7 @@ static int files_log_ref_write(struct files_ref_store *refs,
>>
>>         if (logfd < 0)
>>                 return 0;
>> -       result = log_ref_write_fd(logfd, old_oid, new_oid,
>> -                                 git_committer_info(0), msg);
>> +       result = log_ref_write_fd(logfd, old_oid, new_oid, committer_info, msg);
>
> Here we just pass the committer_info to the above function.
>
>>         if (result) {
>>                 struct strbuf sb = STRBUF_INIT;
>>                 int save_errno = errno;
>> @@ -1974,8 +1978,7 @@ static int commit_ref_update(struct files_ref_store *refs,
>>         files_assert_main_repository(refs, "commit_ref_update");
>>
>>         clear_loose_ref_cache(refs);
>> -       if (files_log_ref_write(refs, lock->ref_name,
>> -                               &lock->old_oid, oid,
>> +       if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid, oid, NULL,
>>                                 logmsg, flags, err)) {
>
> Here we don't have the info so we pass NULL.
>
>>                 char *old_msg = strbuf_detach(err, NULL);
>>                 strbuf_addf(err, "cannot update the ref '%s': %s",
>> @@ -2007,8 +2010,8 @@ static int commit_ref_update(struct files_ref_store *refs,
>>                 if (head_ref && (head_flag & REF_ISSYMREF) &&
>>                     !strcmp(head_ref, lock->ref_name)) {
>>                         struct strbuf log_err = STRBUF_INIT;
>> -                       if (files_log_ref_write(refs, "HEAD",
>> -                                               &lock->old_oid, oid,
>> +                       if (files_log_ref_write(refs, "HEAD", &lock->old_oid,
>> +                                               oid, git_committer_info(0),
>
> Here we don't have the info either, so I think we should also pass
> NULL. It would then be computed "right when the reflog is being
> committed" in the above function. No?
>

Indeed, passing NULL should be sufficient here, good catch.

>>                                                 logmsg, flags, &log_err)) {
>>                                 error("%s", log_err.buf);
>>                                 strbuf_release(&log_err);
>
>
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index 647ef9b05b1dc9a376ed054330b487f7595c5caa..e882602487c66261d586a94101bb1b4e9a2ed60e 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -1379,11 +1379,21 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>
> It is not your fault but write_transaction_table() does the following
> right at the beginning of the function:
>
>        committer_info = git_committer_info(0);
>        if (split_ident_line(&committer_ident, committer_info,
> strlen(committer_info)))
>                BUG("failed splitting committer info");
>
> but then 'committer_ident' is only used in the hunk you are changing:
>
>>                         if (create_reflog) {
>> +                               struct ident_split c;
>> +
>>                                 ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
>>                                 log = &logs[logs_nr++];
>>                                 memset(log, 0, sizeof(*log));
>>
>> -                               fill_reftable_log_record(log, &committer_ident);
>> +                               if (u->committer_info) {
>> +                                       if (split_ident_line(&c, u->committer_info,
>> +                                                            strlen(u->committer_info)))
>> +                                               BUG("failed splitting committer info");
>> +                               } else {
>
> I would think it would be more efficient to only compute
> 'committer_ident' here, right before we use it if needed. Or is there
> something I am missing?
>

It would if there wasn't a loop. Since we loop over multiple updates,
computing committer_ident for each would end up being expensive. So it
is done before the loop starts.

>> +                                       c = committer_ident;
>> +                               }
>> +
>> +                               fill_reftable_log_record(log, &c);
>>                                 log->update_index = ts;
>>                                 log->refname = xstrdup(u->refname);
>>                                 memcpy(log->value.update.new_hash,

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 3/7] refs/files: add count field to ref_lock
  2024-12-10 17:22   ` Christian Couder
@ 2024-12-11 10:18     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-11 10:18 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]

Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>>
>> When refs are updated in the files-backend, a lock is obtained for the
>> corresponding file path. This is the case even for reflogs, i.e. a lock
>> is obtained on the reference path instead of the reflog path. This
>> works, since generally, reflogs are updated alongside the ref.
>>
>> The upcoming patches will add support for reflog updates in ref
>> transaction. This means, in a particular transaction we want to have ref
>> updates and reflog updates. For refs, in a given transaction there can
>> only be one update. But, we can theoretically have multiple reflog
>> updates in a given transaction.
>
> Nit: Giving an example might help understand where multiple reflog
> updates can happen in a given transaction. Alternatively pointing to
> an existing doc that contains such an example or explanations might
> help too.
>

The use-case is added in the series. I've added a note about how this is
needed in reflog migration.

>> @@ -2572,18 +2588,25 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>>                         goto out;
>>         }
>>
>> -       ret = lock_raw_ref(refs, update->refname, mustexist,
>> -                          affected_refnames,
>> -                          &lock, &referent,
>> -                          &update->type, err);
>> -       if (ret) {
>> -               char *reason;
>> +       lock = strmap_get(&backend_data->ref_locks, update->refname);
>> +       if (lock) {
>> +               lock->count = lock->count + 1;
>
> Nit:
>               lock->count++;
>

Will fix, thanks.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 3/7] refs/files: add count field to ref_lock
  2024-12-11  9:05   ` Christian Couder
@ 2024-12-11 10:26     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-11 10:26 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1651 bytes --]

Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>>
>> When refs are updated in the files-backend, a lock is obtained for the
>> corresponding file path. This is the case even for reflogs, i.e. a lock
>> is obtained on the reference path instead of the reflog path. This
>> works, since generally, reflogs are updated alongside the ref.
>>
>> The upcoming patches will add support for reflog updates in ref
>> transaction. This means, in a particular transaction we want to have ref
>> updates and reflog updates. For refs, in a given transaction there can
>> only be one update.
>
> Maybe something like: "For a given ref in a given transaction there
> can be at most one update."
>

Sure.

>> But, we can theoretically have multiple reflog
>> updates in a given transaction.
>
> And: "But we can theoretically have multiple reflog updates for a
> given ref in a given transaction."
>

Will add.

>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index 13f8539e6caa923cd4834775fcb0cd7f90d82014..9c929c1ac33bc62a75620e684a809d46b574f1c6 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -71,6 +71,8 @@ struct ref_lock {
>>         char *ref_name;
>>         struct lock_file lk;
>>         struct object_id old_oid;
>> +       /* count keeps track of users of the lock */
>> +       unsigned int count;
>
> Nit: maybe the following is a bit better:
>
>       unsigned int count; /* track users of the lock (ref update +
> reflog updates) */

This is better, will amend this in too!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 4/7] refs: extract out refname verification in transactions
  2024-12-11  9:26   ` Christian Couder
@ 2024-12-11 10:31     ` karthik nayak
  2024-12-11 14:26     ` Patrick Steinhardt
  1 sibling, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-11 10:31 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2285 bytes --]

Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>>
>> Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
>> the refname of the update is verified for:
>>
>>   - Ensuring it is not a pseudoref.
>>   - Checking the refname format.
>>
>> These checks are also be needed in a following commit where the function
>
> s/are also be needed/will also be needed/
>

Will amend.

>> to add reflog updates to the transaction is introduced. Extract the code
>> out into a new static function.
>>
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
>>  refs.c | 43 ++++++++++++++++++++++++++++---------------
>>  1 file changed, 28 insertions(+), 15 deletions(-)
>>
>> diff --git a/refs.c b/refs.c
>> index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..732c236a3fd0cf324cc172b48d3d54f6dbadf4a4 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -1196,6 +1196,29 @@ struct ref_update *ref_transaction_add_update(
>>         return update;
>>  }
>>
>> +static int transaction_refname_verification(const char *refname,
>> +                                           const struct object_id *new_oid,
>> +                                           unsigned int flags,
>> +                                           struct strbuf *err)
>
> We have a number of functions named 'xxx_valid()' or 'xxx_ok()' while
> I couldn't find any 'yyy_verification()' function, so it might be
> better to name it 'transaction_refname_valid()' or maybe
> 'transaction_refname_ok()'.
>

I think you're right, it helps to be consistent here. Will change to
`transaction_refname_valid()`.

> Also I think it should probably return a bool so 1 if the refname is
> valid and 0 otherwise, unless we have plans in the future to follow
> different code paths depending on the different ways it is not valid.
>

That is a good idea.

>> +       ret = transaction_refname_verification(refname, new_oid, flags, err);
>> +       if (ret)
>> +               return ret;
>
> Then the above could be just:
>
>        if (!transaction_refname_valid(refname, new_oid, flags, err))
>                return -1;

Yup, also will remove the need for the `ret` variable.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 6/7] refs: allow multiple reflog entries for the same refname
  2024-12-09 11:07 ` [PATCH 6/7] refs: allow multiple reflog entries for the same refname Karthik Nayak
@ 2024-12-11 10:44   ` Christian Couder
  2024-12-12 14:52     ` karthik nayak
  2024-12-11 14:26   ` Patrick Steinhardt
  1 sibling, 1 reply; 93+ messages in thread
From: Christian Couder @ 2024-12-11 10:44 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> The reference transaction only allows a update for a given reference to

s/a update/an update/

or: s/a update/a single update/

> avoid conflicts. This, however, isn't an issue for reflogs. There are no
> conflicts to be resolved in reflogs and when migrating reflogs between
> backends we'd have multiple reflog entries for the same refname.


> @@ -1302,6 +1303,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>         struct ident_split committer_ident = {0};
>         size_t logs_nr = 0, logs_alloc = 0, i;
>         const char *committer_info;
> +       struct strintmap logs_ts;

Here a comment might help explain what logs_ts is used for.

>         int ret = 0;
>
>         committer_info = git_committer_info(0);
> @@ -1310,6 +1312,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>
>         QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
>
> +       strintmap_init(&logs_ts, ts);

I am not sure I understand what logs_ts is used for and why its
default value is set to ts.

Also ts is an uint64_t while the second argument to strintmap_init()
is an int. I wonder if it could be an issue especially on 32 bits
platforms.

>         reftable_writer_set_limits(writer, ts, ts);
>
>         for (i = 0; i < arg->updates_nr; i++) {
> @@ -1391,6 +1395,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>
>                         if (create_reflog) {
>                                 struct ident_split c;
> +                               uint64_t update_index;
>
>                                 ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
>                                 log = &logs[logs_nr++];
> @@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>                                 }
>
>                                 fill_reftable_log_record(log, &c);
> -                               log->update_index = ts;
> +
> +                               update_index = strintmap_get(&logs_ts, u->refname);
> +                               log->update_index = update_index;
> +                               strintmap_set(&logs_ts, u->refname, update_index+1);

s/update_index+1/update_index + 1/

Also is the 'update_index' var really needed or could we just do:

                               log->update_index =
strintmap_get(&logs_ts, u->refname);
                               strintmap_set(&logs_ts, u->refname,
log->update_index + 1);

?

>                                 log->refname = xstrdup(u->refname);
>                                 memcpy(log->value.update.new_hash,
>                                        u->new_oid.hash, GIT_MAX_RAWSZ);

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-09 11:07 ` [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
  2024-12-11 10:10   ` Christian Couder
@ 2024-12-11 14:26   ` Patrick Steinhardt
  2024-12-11 18:09     ` karthik nayak
  1 sibling, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-11 14:26 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 09, 2024 at 12:07:19PM +0100, Karthik Nayak wrote:
> diff --git a/refs.c b/refs.c
> index 732c236a3fd0cf324cc172b48d3d54f6dbadf4a4..602a65873181a90751def525608a7fa7bea59562 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1160,13 +1160,15 @@ void ref_transaction_free(struct ref_transaction *transaction)
>  	free(transaction);
>  }
>  
> -struct ref_update *ref_transaction_add_update(
> -		struct ref_transaction *transaction,
> -		const char *refname, unsigned int flags,
> -		const struct object_id *new_oid,
> -		const struct object_id *old_oid,
> -		const char *new_target, const char *old_target,
> -		const char *msg)
> +struct ref_update *ref_transaction_add_update(struct ref_transaction *transaction,
> +					      const char *refname,
> +					      unsigned int flags,
> +					      const struct object_id *new_oid,
> +					      const struct object_id *old_oid,
> +					      const char *new_target,
> +					      const char *old_target,
> +					      const char *committer_info,
> +					      const char *msg)
>  {
>  	struct ref_update *update;
>  

I'd personally avoid reindenting this block. It's somewhat-common
practice to not align all arguments with the opening brace when the line
would become too long. The reindents also distract a bit from the actual
changes done in other places further down.

> @@ -1190,8 +1192,15 @@ struct ref_update *ref_transaction_add_update(
>  		oidcpy(&update->new_oid, new_oid);
>  	if ((flags & REF_HAVE_OLD) && old_oid)
>  		oidcpy(&update->old_oid, old_oid);
> -	if (!(flags & REF_SKIP_CREATE_REFLOG))
> +	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
> +		if (committer_info) {
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_addstr(&sb, committer_info);
> +			update->committer_info = strbuf_detach(&sb, NULL);

Can't we simplify this via `xstrdup()`?

> @@ -3080,10 +3081,12 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		}
>  
>  		/*
> -		 * packed-refs don't support symbolic refs and root refs, so we
> -		 * have to queue these references via the loose transaction.
> +		 * packed-refs don't support symbolic refs, root refs and reflogs,
> +		 * so we have to queue these references via the loose transaction.
>  		 */
> -		if (update->new_target || is_root_ref(update->refname)) {
> +		if (update->new_target ||
> +		    is_root_ref(update->refname) ||
> +		    (update->flags & REF_LOG_ONLY)) {
>  			if (!loose_transaction) {
>  				loose_transaction = ref_store_transaction_begin(&refs->base, 0, err);
>  				if (!loose_transaction) {

Makes sense. While we already had REF_LOG_ONLY beforehand, it was only
used in very specific cases and thus the support implemented by the
backends is lacking. And given that the packed-ref backend does not
support reflogs we have to queue these up via the loose backend.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 4/7] refs: extract out refname verification in transactions
  2024-12-11  9:26   ` Christian Couder
  2024-12-11 10:31     ` karthik nayak
@ 2024-12-11 14:26     ` Patrick Steinhardt
  1 sibling, 0 replies; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-11 14:26 UTC (permalink / raw)
  To: Christian Couder; +Cc: Karthik Nayak, git, toon, Christian Couder

On Wed, Dec 11, 2024 at 10:26:31AM +0100, Christian Couder wrote:
> On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
> > diff --git a/refs.c b/refs.c
> > index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..732c236a3fd0cf324cc172b48d3d54f6dbadf4a4 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -1196,6 +1196,29 @@ struct ref_update *ref_transaction_add_update(
> >         return update;
> >  }
> >
> > +static int transaction_refname_verification(const char *refname,
> > +                                           const struct object_id *new_oid,
> > +                                           unsigned int flags,
> > +                                           struct strbuf *err)
> 
> We have a number of functions named 'xxx_valid()' or 'xxx_ok()' while
> I couldn't find any 'yyy_verification()' function, so it might be
> better to name it 'transaction_refname_valid()' or maybe
> 'transaction_refname_ok()'.
> 
> Also I think it should probably return a bool so 1 if the refname is
> valid and 0 otherwise, unless we have plans in the future to follow
> different code paths depending on the different ways it is not valid.
> 
> > +       ret = transaction_refname_verification(refname, new_oid, flags, err);
> > +       if (ret)
> > +               return ret;
> 
> Then the above could be just:
> 
>        if (!transaction_refname_valid(refname, new_oid, flags, err))
>                return -1;

The only question is whether we want to discern between an invalid
refname and an error. But reading through the code it doesn't seem like
we want to do that as we always return either `-1` on invalid names or
`0` otherwise.

So agreed, this is a good suggestion.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 6/7] refs: allow multiple reflog entries for the same refname
  2024-12-09 11:07 ` [PATCH 6/7] refs: allow multiple reflog entries for the same refname Karthik Nayak
  2024-12-11 10:44   ` Christian Couder
@ 2024-12-11 14:26   ` Patrick Steinhardt
  2024-12-12 14:47     ` karthik nayak
  1 sibling, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-11 14:26 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 09, 2024 at 12:07:20PM +0100, Karthik Nayak wrote:
> The reference transaction only allows a update for a given reference to
> avoid conflicts. This, however, isn't an issue for reflogs. There are no
> conflicts to be resolved in reflogs and when migrating reflogs between
> backends we'd have multiple reflog entries for the same refname.
> 
> So allow multiple reflog updates within a single transaction. Also the
> reflog creation logic isn't exposed to the end user. While this might
> change in the future, currently, this reduces the scope of issues to
> think about.
> 
> This is required to add reflog migration support to `git refs migrate`
> which currently doesn't support it.

Nit: the second half of this sentence starting with "which currently..."
feels rather pointless, as it's implicit in the first half. I'd just
drop it.

> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs/files-backend.c    | 15 +++++++++++----
>  refs/reftable-backend.c | 16 +++++++++++++---
>  2 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 32975e0fd7a03ab8ddf99c0a68af99921d3f5090..10fba1e97b967fbc04c62a0a6d7d9648ce1c51fb 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2612,6 +2612,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>  
>  	update->backend_data = lock;
>  
> +	if (update->flags & REF_LOG_ONLY)
> +		goto out;
> +
>  	if (update->type & REF_ISSYMREF) {
>  		if (update->flags & REF_NO_DEREF) {
>  			/*

Hm. Does this mean that we don't lock at all for REF_LOG_ONLY updates?
Reflogs themselves have no lockfile, so isn't it mandatory that we lock
the corresponding ref like we used to do? Otherwise I cannot see how we
avoid two concurrent writers to the same reflog.

> @@ -3036,8 +3042,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  
>  	/* Fail if a refname appears more than once in the transaction: */
>  	for (i = 0; i < transaction->nr; i++)
> -		string_list_append(&affected_refnames,
> -				   transaction->updates[i]->refname);
> +		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
> +			string_list_append(&affected_refnames,
> +					   transaction->updates[i]->refname);
>  	string_list_sort(&affected_refnames);
>  	if (ref_update_reject_duplicates(&affected_refnames, err)) {
>  		ret = TRANSACTION_GENERIC_ERROR;

This on the other hand is sensible -- having multiple REF_LOG_ONLY
transactions queued is fine.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..d9d2e28122a00ddd7f835c35a5851e390761885b 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>  				}
>  
>  				fill_reftable_log_record(log, &c);
> -				log->update_index = ts;
> +
> +				update_index = strintmap_get(&logs_ts, u->refname);
> +				log->update_index = update_index;
> +				strintmap_set(&logs_ts, u->refname, update_index+1);

So we're now tracking update indices via another map in order to ensure
that the update index will be increased if we have multiple reflog
entries for the same refname. Can we avoid that overhead by instead just
having a global update index counter that increases for every single
reflog entry, regardless of whether we have multiple ones queued up for
the same reference?

I guess the result would be kind of weird as a single transaction with
multiple ref updates would now always contain N different update
indices. Maybe there's an alternative that allows us to reduce the cost,
like only doing this for REF_LOG_ONLY updates?

I'm mostly being careful because this here is the hot loop of writing
refs, so I don't want to regress performance.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 7/7] refs: add support for migrating reflogs
  2024-12-09 11:07 ` [PATCH 7/7] refs: add support for migrating reflogs Karthik Nayak
@ 2024-12-11 14:26   ` Patrick Steinhardt
  2024-12-12 14:04     ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-11 14:26 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, Christian Couder

On Mon, Dec 09, 2024 at 12:07:21PM +0100, Karthik Nayak wrote:
> @@ -2687,6 +2688,7 @@ int ref_update_check_old_target(const char *referent, struct ref_update *update,
>  }
>  
>  struct migration_data {
> +	unsigned int index;
>  	struct ref_store *old_refs;
>  	struct ref_transaction *transaction;
>  	struct strbuf *errbuf;

Calling this `reflog_index` might be a bit easier for context.

> @@ -2868,8 +2894,8 @@ int repo_migrate_ref_storage_format(struct repository *repo,
>  	 *   1. Set up a new temporary directory and initialize it with the new
>  	 *      format. This is where all refs will be migrated into.
>  	 *
> -	 *   2. Enumerate all refs and write them into the new ref storage.
> -	 *      This operation is safe as we do not yet modify the main
> +	 *   2. Enumerate all refs and reflogs and write them into the new ref
> +	 *      storage. This operation is safe as we do not yet modify the main

I'd rather move this into a third step, as it is separate from the ref
enumeration.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-11 10:10   ` Christian Couder
@ 2024-12-11 18:06     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-11 18:06 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 6193 bytes --]

Christian Couder <christian.couder@gmail.com> writes:

[snip]

>> diff --git a/refs.c b/refs.c
>> index 732c236a3fd0cf324cc172b48d3d54f6dbadf4a4..602a65873181a90751def525608a7fa7bea59562 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -1160,13 +1160,15 @@ void ref_transaction_free(struct ref_transaction *transaction)
>>         free(transaction);
>>  }
>>
>> -struct ref_update *ref_transaction_add_update(
>> -               struct ref_transaction *transaction,
>> -               const char *refname, unsigned int flags,
>> -               const struct object_id *new_oid,
>> -               const struct object_id *old_oid,
>> -               const char *new_target, const char *old_target,
>> -               const char *msg)
>> +struct ref_update *ref_transaction_add_update(struct ref_transaction *transaction,
>> +                                             const char *refname,
>> +                                             unsigned int flags,
>> +                                             const struct object_id *new_oid,
>> +                                             const struct object_id *old_oid,
>> +                                             const char *new_target,
>> +                                             const char *old_target,
>> +                                             const char *committer_info,
>
> This change (adding a 'const char *committer_info' argument to
> ref_transaction_add_update()) is not described in the commit message
> and it requires a number of changes to the callers of this function,
> so I think it might want to be in its own preparatory commit before
> this one.
>

I think this is a great suggestion, it would reduce the congnitive load
of the commit and make it easier to review. Will do.

>> +                                             const char *msg)
>>  {
>>         struct ref_update *update;
>>
>> @@ -1190,8 +1192,15 @@ struct ref_update *ref_transaction_add_update(
>>                 oidcpy(&update->new_oid, new_oid);
>>         if ((flags & REF_HAVE_OLD) && old_oid)
>>                 oidcpy(&update->old_oid, old_oid);
>> -       if (!(flags & REF_SKIP_CREATE_REFLOG))
>> +       if (!(flags & REF_SKIP_CREATE_REFLOG)) {
>> +               if (committer_info) {
>> +                       struct strbuf sb = STRBUF_INIT;
>> +                       strbuf_addstr(&sb, committer_info);
>> +                       update->committer_info = strbuf_detach(&sb, NULL);
>
> Maybe:
>                       update->committer_info = xstrdup(committer_info);
>

Indeed, I thought there was a better way. This is what I needed to have done.

>> +               }
>> +
>>                 update->msg = normalize_reflog_message(msg);
>> +       }
>>
>>         return update;
>>  }
>> @@ -1199,20 +1208,29 @@ struct ref_update *ref_transaction_add_update(
>>  static int transaction_refname_verification(const char *refname,
>>                                             const struct object_id *new_oid,
>>                                             unsigned int flags,
>> +                                           unsigned int reflog,
>>                                             struct strbuf *err)
>>  {
>>         if (flags & REF_SKIP_REFNAME_VERIFICATION)
>>                 return 0;
>>
>>         if (is_pseudo_ref(refname)) {
>> -               strbuf_addf(err, _("refusing to update pseudoref '%s'"),
>> -                           refname);
>> +               if (reflog)
>> +                       strbuf_addf(err, _("refusing to update reflog for pseudoref '%s'"),
>> +                                   refname);
>> +               else
>> +                       strbuf_addf(err, _("refusing to update pseudoref '%s'"),
>> +                                   refname);
>
> Maybe:
>
>               const char *what = reflog ? "reflog for pseudoref" : "pseudoref";
>               strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
>

Much nicer, will add.

>>                 return -1;
>>         } else if ((new_oid && !is_null_oid(new_oid)) ?
>>                  check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
>>                  !refname_is_safe(refname)) {
>> -               strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
>> -                           refname);
>> +               if (reflog)
>> +                       strbuf_addf(err, _("refusing to update reflog with bad name '%s'"),
>> +                                   refname);
>> +               else
>> +                       strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
>> +                                   refname);
>
> Maybe:
>
>               const char *what = reflog ? "reflog with bad name" :
> "ref with bad name";
>               strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
>

Similar, will do.

>>                 return -1;
>>         }
>
> [...]
>
>>  int ref_transaction_create(struct ref_transaction *transaction,
>> -                          const char *refname,
>> -                          const struct object_id *new_oid,
>> -                          const char *new_target,
>> -                          unsigned int flags, const char *msg,
>> -                          struct strbuf *err)
>> +                          const char *refname, const struct object_id *new_oid,
>> +                          const char *new_target, unsigned int flags,
>> +                          const char *msg, struct strbuf *err)
>
> This looks like a wrapping or indenting only change. If you really
> want to do it, it should probably be in its own preparatory commit.
>

I think it was the auto linter, will remove.

>> index a5bedf48cf6de91005a7e8d0bf58ca98350397a6..b86d2cd87be33f7bb1b31fce711d6c7c8d9491c9 100644
>> --- a/refs.h
>> +++ b/refs.h
>> @@ -727,6 +727,18 @@ int ref_transaction_update(struct ref_transaction *transaction,
>>                            unsigned int flags, const char *msg,
>>                            struct strbuf *err);
>>
>> +/*
>> + * Similar to `ref_transaction_update`, but this function is only for adding
>> + * a reflog updates.
>
> "a reflog update" or "reflog updates".

Makes sense. Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-11 14:26   ` Patrick Steinhardt
@ 2024-12-11 18:09     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-11 18:09 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3028 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Dec 09, 2024 at 12:07:19PM +0100, Karthik Nayak wrote:
>> diff --git a/refs.c b/refs.c
>> index 732c236a3fd0cf324cc172b48d3d54f6dbadf4a4..602a65873181a90751def525608a7fa7bea59562 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -1160,13 +1160,15 @@ void ref_transaction_free(struct ref_transaction *transaction)
>>  	free(transaction);
>>  }
>>
>> -struct ref_update *ref_transaction_add_update(
>> -		struct ref_transaction *transaction,
>> -		const char *refname, unsigned int flags,
>> -		const struct object_id *new_oid,
>> -		const struct object_id *old_oid,
>> -		const char *new_target, const char *old_target,
>> -		const char *msg)
>> +struct ref_update *ref_transaction_add_update(struct ref_transaction *transaction,
>> +					      const char *refname,
>> +					      unsigned int flags,
>> +					      const struct object_id *new_oid,
>> +					      const struct object_id *old_oid,
>> +					      const char *new_target,
>> +					      const char *old_target,
>> +					      const char *committer_info,
>> +					      const char *msg)
>>  {
>>  	struct ref_update *update;
>>
>
> I'd personally avoid reindenting this block. It's somewhat-common
> practice to not align all arguments with the opening brace when the line
> would become too long. The reindents also distract a bit from the actual
> changes done in other places further down.
>

Makes sense, I'll undo that.

>> @@ -1190,8 +1192,15 @@ struct ref_update *ref_transaction_add_update(
>>  		oidcpy(&update->new_oid, new_oid);
>>  	if ((flags & REF_HAVE_OLD) && old_oid)
>>  		oidcpy(&update->old_oid, old_oid);
>> -	if (!(flags & REF_SKIP_CREATE_REFLOG))
>> +	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
>> +		if (committer_info) {
>> +			struct strbuf sb = STRBUF_INIT;
>> +			strbuf_addstr(&sb, committer_info);
>> +			update->committer_info = strbuf_detach(&sb, NULL);
>
> Can't we simplify this via `xstrdup()`?
>

Yup, Christian suggested the same too, will fix it up.

>> @@ -3080,10 +3081,12 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>>  		}
>>
>>  		/*
>> -		 * packed-refs don't support symbolic refs and root refs, so we
>> -		 * have to queue these references via the loose transaction.
>> +		 * packed-refs don't support symbolic refs, root refs and reflogs,
>> +		 * so we have to queue these references via the loose transaction.
>>  		 */
>> -		if (update->new_target || is_root_ref(update->refname)) {
>> +		if (update->new_target ||
>> +		    is_root_ref(update->refname) ||
>> +		    (update->flags & REF_LOG_ONLY)) {
>>  			if (!loose_transaction) {
>>  				loose_transaction = ref_store_transaction_begin(&refs->base, 0, err);
>>  				if (!loose_transaction) {
>
> Makes sense. While we already had REF_LOG_ONLY beforehand, it was only
> used in very specific cases and thus the support implemented by the
> backends is lacking. And given that the packed-ref backend does not
> support reflogs we have to queue these up via the loose backend.
>
> Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 7/7] refs: add support for migrating reflogs
  2024-12-11 14:26   ` Patrick Steinhardt
@ 2024-12-12 14:04     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-12 14:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Dec 09, 2024 at 12:07:21PM +0100, Karthik Nayak wrote:
>> @@ -2687,6 +2688,7 @@ int ref_update_check_old_target(const char *referent, struct ref_update *update,
>>  }
>>
>>  struct migration_data {
>> +	unsigned int index;
>>  	struct ref_store *old_refs;
>>  	struct ref_transaction *transaction;
>>  	struct strbuf *errbuf;
>
> Calling this `reflog_index` might be a bit easier for context.
>

Makes sense, will amend.

>> @@ -2868,8 +2894,8 @@ int repo_migrate_ref_storage_format(struct repository *repo,
>>  	 *   1. Set up a new temporary directory and initialize it with the new
>>  	 *      format. This is where all refs will be migrated into.
>>  	 *
>> -	 *   2. Enumerate all refs and write them into the new ref storage.
>> -	 *      This operation is safe as we do not yet modify the main
>> +	 *   2. Enumerate all refs and reflogs and write them into the new ref
>> +	 *      storage. This operation is safe as we do not yet modify the main
>
> I'd rather move this into a third step, as it is separate from the ref
> enumeration.
>
> Patrick
>

I added it together since it is a single transaction, but I'm okay with
making this change, will amend in. Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 6/7] refs: allow multiple reflog entries for the same refname
  2024-12-11 14:26   ` Patrick Steinhardt
@ 2024-12-12 14:47     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-12 14:47 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 4828 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Dec 09, 2024 at 12:07:20PM +0100, Karthik Nayak wrote:
>> The reference transaction only allows a update for a given reference to
>> avoid conflicts. This, however, isn't an issue for reflogs. There are no
>> conflicts to be resolved in reflogs and when migrating reflogs between
>> backends we'd have multiple reflog entries for the same refname.
>>
>> So allow multiple reflog updates within a single transaction. Also the
>> reflog creation logic isn't exposed to the end user. While this might
>> change in the future, currently, this reduces the scope of issues to
>> think about.
>>
>> This is required to add reflog migration support to `git refs migrate`
>> which currently doesn't support it.
>
> Nit: the second half of this sentence starting with "which currently..."
> feels rather pointless, as it's implicit in the first half. I'd just
> drop it.
>

Will do.

>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
>>  refs/files-backend.c    | 15 +++++++++++----
>>  refs/reftable-backend.c | 16 +++++++++++++---
>>  2 files changed, 24 insertions(+), 7 deletions(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index 32975e0fd7a03ab8ddf99c0a68af99921d3f5090..10fba1e97b967fbc04c62a0a6d7d9648ce1c51fb 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -2612,6 +2612,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>>
>>  	update->backend_data = lock;
>>
>> +	if (update->flags & REF_LOG_ONLY)
>> +		goto out;
>> +
>>  	if (update->type & REF_ISSYMREF) {
>>  		if (update->flags & REF_NO_DEREF) {
>>  			/*
>
> Hm. Does this mean that we don't lock at all for REF_LOG_ONLY updates?
> Reflogs themselves have no lockfile, so isn't it mandatory that we lock
> the corresponding ref like we used to do? Otherwise I cannot see how we
> avoid two concurrent writers to the same reflog.
>

No it doesn't, this is after the lock is obtained. We simply exit early
since for reflog only updates, there is nothing further to do.

>> @@ -3036,8 +3042,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>>
>>  	/* Fail if a refname appears more than once in the transaction: */
>>  	for (i = 0; i < transaction->nr; i++)
>> -		string_list_append(&affected_refnames,
>> -				   transaction->updates[i]->refname);
>> +		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
>> +			string_list_append(&affected_refnames,
>> +					   transaction->updates[i]->refname);
>>  	string_list_sort(&affected_refnames);
>>  	if (ref_update_reject_duplicates(&affected_refnames, err)) {
>>  		ret = TRANSACTION_GENERIC_ERROR;
>
> This on the other hand is sensible -- having multiple REF_LOG_ONLY
> transactions queued is fine.
>
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..d9d2e28122a00ddd7f835c35a5851e390761885b 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>  				}
>>
>>  				fill_reftable_log_record(log, &c);
>> -				log->update_index = ts;
>> +
>> +				update_index = strintmap_get(&logs_ts, u->refname);
>> +				log->update_index = update_index;
>> +				strintmap_set(&logs_ts, u->refname, update_index+1);
>
> So we're now tracking update indices via another map in order to ensure
> that the update index will be increased if we have multiple reflog
> entries for the same refname. Can we avoid that overhead by instead just
> having a global update index counter that increases for every single
> reflog entry, regardless of whether we have multiple ones queued up for
> the same reference?
>
> I guess the result would be kind of weird as a single transaction with
> multiple ref updates would now always contain N different update
> indices. Maybe there's an alternative that allows us to reduce the cost,
> like only doing this for REF_LOG_ONLY updates?
>
> I'm mostly being careful because this here is the hot loop of writing
> refs, so I don't want to regress performance.

Thanks for bringing this up. I was thinking hard about this for a while.
I also did some local benchmarking, for 10000 atomic writes, I couldn't
find a note-worthy regression.

But I really like the point you made about how this could probably use a
counter. I think we can use `u->index`. The index field was added to
ensure that the logs stay in order when we sort inside
`write_transaction_table()`. But we can use the same here. We can simply
do

    log->update_index = ts + u->index;

This would increment the update_index as needed and also only kick in
when the `index` itself is set. Which would be only for reflog migration
between backends.

I think this would work well.

Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 6/7] refs: allow multiple reflog entries for the same refname
  2024-12-11 10:44   ` Christian Couder
@ 2024-12-12 14:52     ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-12 14:52 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, toon, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3818 bytes --]

Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Dec 9, 2024 at 12:11 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>>
>> The reference transaction only allows a update for a given reference to
>
> s/a update/an update/
>
> or: s/a update/a single update/

'a single update' sounds the best here, will add.

>> avoid conflicts. This, however, isn't an issue for reflogs. There are no
>> conflicts to be resolved in reflogs and when migrating reflogs between
>> backends we'd have multiple reflog entries for the same refname.
>
>
>> @@ -1302,6 +1303,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>         struct ident_split committer_ident = {0};
>>         size_t logs_nr = 0, logs_alloc = 0, i;
>>         const char *committer_info;
>> +       struct strintmap logs_ts;
>
> Here a comment might help explain what logs_ts is used for.
>

I think with Patricks comment, this whole code will be removed for
something simpler.

>>         int ret = 0;
>>
>>         committer_info = git_committer_info(0);
>> @@ -1310,6 +1312,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>
>>         QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
>>
>> +       strintmap_init(&logs_ts, ts);
>
> I am not sure I understand what logs_ts is used for and why its
> default value is set to ts.
>

The reason I added this was because in the reftable backend, the writer
sorts logs before writing. So if the multiple reflogs contained the same
update_index, their order might be changed. But for migrating reflogs,
we need to ensure we maintain the order. Using a map here, allowed us to
increment the update_index for reflogs for a given refname.

> Also ts is an uint64_t while the second argument to strintmap_init()
> is an int. I wonder if it could be an issue especially on 32 bits
> platforms.
>

This is fair point, I decided to scrap this ultimately and simply append
`u->index` to the update_index. Which would provide the same desired
effect.

>>         reftable_writer_set_limits(writer, ts, ts);
>>
>>         for (i = 0; i < arg->updates_nr; i++) {
>> @@ -1391,6 +1395,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>
>>                         if (create_reflog) {
>>                                 struct ident_split c;
>> +                               uint64_t update_index;
>>
>>                                 ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
>>                                 log = &logs[logs_nr++];
>> @@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>                                 }
>>
>>                                 fill_reftable_log_record(log, &c);
>> -                               log->update_index = ts;
>> +
>> +                               update_index = strintmap_get(&logs_ts, u->refname);
>> +                               log->update_index = update_index;
>> +                               strintmap_set(&logs_ts, u->refname, update_index+1);
>
> s/update_index+1/update_index + 1/
>
> Also is the 'update_index' var really needed or could we just do:
>
>                                log->update_index =
> strintmap_get(&logs_ts, u->refname);
>                                strintmap_set(&logs_ts, u->refname,
> log->update_index + 1);
>
> ?
>

The temp variable can be removed here indeed. But I'll remove all of
this in the next version. Thanks

>>                                 log->refname = xstrdup(u->refname);
>>                                 memcpy(log->value.update.new_hash,
>>                                        u->new_oid.hash, GIT_MAX_RAWSZ);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v2 0/8] refs: add reflog support to `git refs migrate`
  2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
                   ` (7 preceding siblings ...)
  2024-12-10 12:13 ` [PATCH 0/7] refs: add reflog support to `git refs migrate` Junio C Hamano
@ 2024-12-13 10:36 ` Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
                     ` (8 more replies)
  8 siblings, 9 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the feature was that it didn't support migrating
repositories which contained reflogs. This isn't a requirement on the
server side as repositories are stored as bare repositories (which do
not contain any reflogs). Clients however generally use reflogs and
until now couldn't use the `git refs migrate` command to migrate their
repositories to the new reftable format.

One of the issues for adding reflog support is that the ref transactions
don't support reflogs additions:
  1. While there is REF_LOG_ONLY flag, there is no function to utilize
  the flag and add reflogs.
  2. reference backends generally sort the updates by the refname. This
  wouldn't work for reflogs which need to ensure that they maintain the
  order of creation.
  3. In the files backend, reflog entries are added by obtaining locks
  on the refs themselves. This means each update in the transaction, will
  obtain a ref_lock. This paradigm fails to accompany the fact that there
  could be multiple reflog updates for a refname in a single transaction.
  4. The backends check for duplicate entries, which doesn't make sense
  in the context of adding multiple reflogs for a given refname.

We overcome these issue we make the following changes:
  - Update the ref_update structure to also include the committer
  information. Using this, we can add a new function which only adds
  reflog updates to the transaction.
  - Add an index field to the ref_update structure, this will help order
  updates in pre-defined order, this fixes #2.
  - While the ideal fix for #3 would be to actually introduce reflog
  locks, this wouldn't be possible without breaking backward
  compatibility. So we add a count field to the existing ref_lock. With
  this, multiple reflog updates can share a single ref_lock.

Overall, this series is a bit more involved, and I would appreciate it
if it receives a bit more scrutiny.

The series is based on top of e66fd72e97 (The fourteenth batch,
2024-12-06) with `kn/reftable-writer-log-write-verify` merged in.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v2:
- Split patch 5 into two separate patches. This should make it easier to
  review and reduce cognitive load in a single patch.
- In reftable backend, instead of using `strmapint` to ensure we have
  new update_indexes for reflogs with the same refname, we now use the
  already available `update->index` field to increment the update_index.
- Cleanup the code and follow some of the better practices.
- Add some clarity to the commit messages.
- Link to v1: https://lore.kernel.org/r/20241209-320-git-refs-migrate-reflogs-v1-0-d4bc37ee860f@gmail.com

---
Karthik Nayak (8):
      refs: include committer info in `ref_update` struct
      refs: add `index` field to `struct ref_udpate`
      refs/files: add count field to ref_lock
      refs: extract out refname verification in transactions
      refs: add `committer_info` to `ref_transaction_add_update()`
      refs: introduce the `ref_transaction_update_reflog` function
      refs: allow multiple reflog entries for the same refname
      refs: add support for migrating reflogs

 Documentation/git-refs.txt |   2 -
 refs.c                     | 168 +++++++++++++++++++++++++++++++++------------
 refs.h                     |  12 ++++
 refs/files-backend.c       | 131 +++++++++++++++++++++++------------
 refs/refs-internal.h       |   9 +++
 refs/reftable-backend.c    |  53 +++++++++++---
 t/t1460-refs-migrate.sh    |  73 ++++++++++++++------
 7 files changed, 328 insertions(+), 120 deletions(-)
---

Range-diff versus v1:

1:  627104646c ! 1:  81a493ae20 refs: include committer info in `ref_update` struct
    @@ refs/files-backend.c: static int commit_ref_update(struct files_ref_store *refs,
      			struct strbuf log_err = STRBUF_INIT;
     -			if (files_log_ref_write(refs, "HEAD",
     -						&lock->old_oid, oid,
    +-						logmsg, flags, &log_err)) {
     +			if (files_log_ref_write(refs, "HEAD", &lock->old_oid,
    -+						oid, git_committer_info(0),
    - 						logmsg, flags, &log_err)) {
    ++						oid, NULL, logmsg, flags,
    ++						&log_err)) {
      				error("%s", log_err.buf);
      				strbuf_release(&log_err);
    + 			}
     @@ refs/files-backend.c: static int parse_and_write_reflog(struct files_ref_store *refs,
      	}
      
2:  7bc7c7cb1b = 2:  3c4d53de5c refs: add `index` field to `struct ref_udpate`
3:  807fff5d50 ! 3:  598ad493af refs/files: add count field to ref_lock
    @@ Commit message
     
         The upcoming patches will add support for reflog updates in ref
         transaction. This means, in a particular transaction we want to have ref
    -    updates and reflog updates. For refs, in a given transaction there can
    -    only be one update. But, we can theoretically have multiple reflog
    -    updates in a given transaction.
    +    updates and reflog updates. For a given ref in a given transaction there
    +    can be at most one update. But we can theoretically have multiple reflog
    +    updates for a given ref in a given transaction. A great example of this
    +    would be when migrating reflogs from one backend to another. There we
    +    would batch all the reflog updates for a given reference in a single
    +    transaction.
     
         The current flow does not support this, because currently refs & reflogs
         are treated as a single entity and capture the lock together. To
    @@ refs/files-backend.c: struct ref_lock {
      	char *ref_name;
      	struct lock_file lk;
      	struct object_id old_oid;
    -+	/* count keeps track of users of the lock */
    -+	unsigned int count;
    ++	unsigned int count; /* track users of the lock (ref update + reflog updates) */
      };
      
      struct files_ref_store {
    @@ refs/files-backend.c: static int lock_ref_for_update(struct files_ref_store *ref
     -		char *reason;
     +	lock = strmap_get(&backend_data->ref_locks, update->refname);
     +	if (lock) {
    -+		lock->count = lock->count + 1;
    ++		lock->count++;
     +	} else {
     +		ret = lock_raw_ref(refs, update->refname, mustexist,
     +				   affected_refnames,
4:  33473ad609 < -:  ---------- refs: extract out refname verification in transactions
5:  ae85d9e340 < -:  ---------- refs: introduce the `ref_transaction_update_reflog` function
-:  ---------- > 4:  64a3cbd91d refs: extract out refname verification in transactions
-:  ---------- > 5:  888f96facb refs: add `committer_info` to `ref_transaction_add_update()`
-:  ---------- > 6:  9253e1ceda refs: introduce the `ref_transaction_update_reflog` function
6:  b49872a6b5 ! 7:  6d12784851 refs: allow multiple reflog entries for the same refname
    @@ Metadata
      ## Commit message ##
         refs: allow multiple reflog entries for the same refname
     
    -    The reference transaction only allows a update for a given reference to
    -    avoid conflicts. This, however, isn't an issue for reflogs. There are no
    -    conflicts to be resolved in reflogs and when migrating reflogs between
    -    backends we'd have multiple reflog entries for the same refname.
    +    The reference transaction only allows a single update for a given
    +    reference to avoid conflicts. This, however, isn't an issue for reflogs.
    +    There are no conflicts to be resolved in reflogs and when migrating
    +    reflogs between backends we'd have multiple reflog entries for the same
    +    refname.
     
         So allow multiple reflog updates within a single transaction. Also the
         reflog creation logic isn't exposed to the end user. While this might
         change in the future, currently, this reduces the scope of issues to
         think about.
     
    -    This is required to add reflog migration support to `git refs migrate`
    -    which currently doesn't support it.
    +    In the reftable backend, the writer sorts all updates based on the
    +    update_index before writing to the block. When there are multiple
    +    reflogs for a given refname, it is essential that the order of the
    +    reflogs is maintained. So add the `index` value to the `update_index`.
    +    The `index` field is only be set when multiple reflog entries for a
    +    given refname are added and as such in most scenarios the old behavior
    +    remains.
    +
    +    This is required to add reflog migration support to `git refs migrate`.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
    @@ refs/reftable-backend.c: static int reftable_be_transaction_prepare(struct ref_s
      
      	/*
     @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
    + 	struct reftable_log_record *logs = NULL;
      	struct ident_split committer_ident = {0};
      	size_t logs_nr = 0, logs_alloc = 0, i;
    ++	uint64_t max_update_index = ts;
      	const char *committer_info;
    -+	struct strintmap logs_ts;
      	int ret = 0;
      
    - 	committer_info = git_committer_info(0);
    -@@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
    - 
    - 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
    - 
    -+	strintmap_init(&logs_ts, ts);
    -+
    - 	reftable_writer_set_limits(writer, ts, ts);
    - 
    - 	for (i = 0; i < arg->updates_nr; i++) {
    -@@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
    - 
    - 			if (create_reflog) {
    - 				struct ident_split c;
    -+				uint64_t update_index;
    - 
    - 				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
    - 				log = &logs[logs_nr++];
     @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
      				}
      
      				fill_reftable_log_record(log, &c);
     -				log->update_index = ts;
     +
    -+				update_index = strintmap_get(&logs_ts, u->refname);
    -+				log->update_index = update_index;
    -+				strintmap_set(&logs_ts, u->refname, update_index+1);
    ++				/*
    ++				 * Updates are sorted by the writer. So updates for the same
    ++				 * refname need to contain different update indices.
    ++				 */
    ++				log->update_index = ts + u->index;
    ++
    ++				/*
    ++				 * Note the max update_index so the limit can be set later on.
    ++				 */
    ++				if (log->update_index > max_update_index)
    ++					max_update_index = log->update_index;
     +
      				log->refname = xstrdup(u->refname);
      				memcpy(log->value.update.new_hash,
      				       u->new_oid.hash, GIT_MAX_RAWSZ);
     @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
    - 
    - done:
    - 	assert(ret != REFTABLE_API_ERROR);
    -+	strintmap_clear(&logs_ts);
    - 	for (i = 0; i < logs_nr; i++)
    - 		reftable_log_record_release(&logs[i]);
    - 	free(logs);
    + 	 * and log blocks.
    + 	 */
    + 	if (logs) {
    ++		reftable_writer_set_limits(writer, ts, max_update_index);
    ++
    + 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
    + 		if (ret < 0)
    + 			goto done;
7:  0df1005b2b ! 8:  06dba479d6 refs: add support for migrating reflogs
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
      }
      
      struct migration_data {
    -+	unsigned int index;
    ++	unsigned int reflog_index;
      	struct ref_store *old_refs;
      	struct ref_transaction *transaction;
      	struct strbuf *errbuf;
    @@ refs.c: static int migrate_one_ref(const char *refname, const char *referent UNU
     +	data.old_refs = migration_data->old_refs;
     +	data.transaction = migration_data->transaction;
     +	data.errbuf = migration_data->errbuf;
    -+	data.index = &migration_data->index;
    ++	data.index = &migration_data->reflog_index;
     +
     +	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
     +					migrate_one_reflog_entry, &data);
    @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
      	 * Worktrees complicate the migration because every worktree has a
      	 * separate ref storage. While it should be feasible to implement, this
     @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
    - 	 *   1. Set up a new temporary directory and initialize it with the new
    - 	 *      format. This is where all refs will be migrated into.
    - 	 *
    --	 *   2. Enumerate all refs and write them into the new ref storage.
    --	 *      This operation is safe as we do not yet modify the main
    -+	 *   2. Enumerate all refs and reflogs and write them into the new ref
    -+	 *      storage. This operation is safe as we do not yet modify the main
    + 	 *      This operation is safe as we do not yet modify the main
      	 *      repository.
      	 *
    - 	 *   3. If we're in dry-run mode then we are done and can hand over the
    +-	 *   3. If we're in dry-run mode then we are done and can hand over the
    ++	 *   3. Enumerate all reflogs and write them into the new ref storage.
    ++	 *      This operation is safe as we do not yet modify the main
    ++	 *      repository.
    ++	 *
    ++	 *   4. If we're in dry-run mode then we are done and can hand over the
    + 	 *      directory to the caller for inspection. If not, we now start
    + 	 *      with the destructive part.
    + 	 *
    +-	 *   4. Delete the old ref storage from disk. As we have a copy of refs
    ++	 *   5. Delete the old ref storage from disk. As we have a copy of refs
    + 	 *      in the new ref storage it's okay(ish) if we now get interrupted
    + 	 *      as there is an equivalent copy of all refs available.
    + 	 *
    +-	 *   5. Move the new ref storage files into place.
    ++	 *   6. Move the new ref storage files into place.
    + 	 *
    +-	 *   6. Change the repository format to the new ref format.
    ++	 *  7. Change the repository format to the new ref format.
    + 	 */
    + 	strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
    + 	if (!mkdtemp(new_gitdir.buf)) {
     @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
      	if (ret < 0)
      		goto done;
      
    -+	data.index = 1;
    ++	data.reflog_index = 1;
     +	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
     +	if (ret < 0)
     +		goto done;


--- 

base-commit: 09245f4b75863f4e94dac7feebaafce53a26965f
change-id: 20241111-320-git-refs-migrate-reflogs-a53e3a6cffc9

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v2 1/8] refs: include committer info in `ref_update` struct
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reference backends obtain the committer information from
`git_committer_info(0)` when adding a reflog. The upcoming patches
introduce support for migrating reflogs between the reference backends.
This requires an interface to creating reflogs, including custom
committer information.

Add a new field `committer_info` to the `ref_update` struct, which is
then used by the reference backends. If there is no `committer_info`
provided, the reference backends default to using
`git_committer_info(0)`. The field itself cannot be set to
`git_committer_info(0)` since the values are dynamic and must be
obtained right when the reflog is being committed.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  1 +
 refs/files-backend.c    | 24 ++++++++++++++----------
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 12 +++++++++++-
 4 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/refs.c b/refs.c
index 762f3e324d59c60cd4f05c2f257e54de8deb00e5..f003e51c6bf5229bfbce8ce61ffad7cdba0572e0 100644
--- a/refs.c
+++ b/refs.c
@@ -1151,6 +1151,7 @@ void ref_transaction_free(struct ref_transaction *transaction)
 
 	for (i = 0; i < transaction->nr; i++) {
 		free(transaction->updates[i]->msg);
+		free(transaction->updates[i]->committer_info);
 		free((char *)transaction->updates[i]->new_target);
 		free((char *)transaction->updates[i]->old_target);
 		free(transaction->updates[i]);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 64f51f0da905a9a8a1ac4109c6b0a9a85a355db7..6078668c99ee254e794e3ba49689aa34e6022efd 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1858,6 +1858,9 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 	struct strbuf sb = STRBUF_INIT;
 	int ret = 0;
 
+	if (!committer)
+		committer = git_committer_info(0);
+
 	strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
 	if (msg && *msg) {
 		strbuf_addch(&sb, '\t');
@@ -1871,8 +1874,10 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 }
 
 static int files_log_ref_write(struct files_ref_store *refs,
-			       const char *refname, const struct object_id *old_oid,
-			       const struct object_id *new_oid, const char *msg,
+			       const char *refname,
+			       const struct object_id *old_oid,
+			       const struct object_id *new_oid,
+			       const char *committer_info, const char *msg,
 			       int flags, struct strbuf *err)
 {
 	int logfd, result;
@@ -1889,8 +1894,7 @@ static int files_log_ref_write(struct files_ref_store *refs,
 
 	if (logfd < 0)
 		return 0;
-	result = log_ref_write_fd(logfd, old_oid, new_oid,
-				  git_committer_info(0), msg);
+	result = log_ref_write_fd(logfd, old_oid, new_oid, committer_info, msg);
 	if (result) {
 		struct strbuf sb = STRBUF_INIT;
 		int save_errno = errno;
@@ -1974,8 +1978,7 @@ static int commit_ref_update(struct files_ref_store *refs,
 	files_assert_main_repository(refs, "commit_ref_update");
 
 	clear_loose_ref_cache(refs);
-	if (files_log_ref_write(refs, lock->ref_name,
-				&lock->old_oid, oid,
+	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid, oid, NULL,
 				logmsg, flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 		strbuf_addf(err, "cannot update the ref '%s': %s",
@@ -2007,9 +2010,9 @@ static int commit_ref_update(struct files_ref_store *refs,
 		if (head_ref && (head_flag & REF_ISSYMREF) &&
 		    !strcmp(head_ref, lock->ref_name)) {
 			struct strbuf log_err = STRBUF_INIT;
-			if (files_log_ref_write(refs, "HEAD",
-						&lock->old_oid, oid,
-						logmsg, flags, &log_err)) {
+			if (files_log_ref_write(refs, "HEAD", &lock->old_oid,
+						oid, NULL, logmsg, flags,
+						&log_err)) {
 				error("%s", log_err.buf);
 				strbuf_release(&log_err);
 			}
@@ -2969,7 +2972,8 @@ static int parse_and_write_reflog(struct files_ref_store *refs,
 	}
 
 	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid,
-				&update->new_oid, update->msg, update->flags, err)) {
+				&update->new_oid, update->committer_info,
+				update->msg, update->flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 
 		strbuf_addf(err, "cannot update the ref '%s': %s",
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 58aa56d1b27c85d606ed7c8c0d908e4b87d1066b..0fd95cdacd99e4a728c22f5286f6b3f0f360c110 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -113,6 +113,7 @@ struct ref_update {
 	void *backend_data;
 	unsigned int type;
 	char *msg;
+	char *committer_info;
 
 	/*
 	 * If this ref_update was split off of a symref update via
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 647ef9b05b1dc9a376ed054330b487f7595c5caa..e882602487c66261d586a94101bb1b4e9a2ed60e 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1379,11 +1379,21 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			}
 
 			if (create_reflog) {
+				struct ident_split c;
+
 				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
 				log = &logs[logs_nr++];
 				memset(log, 0, sizeof(*log));
 
-				fill_reftable_log_record(log, &committer_ident);
+				if (u->committer_info) {
+					if (split_ident_line(&c, u->committer_info,
+							     strlen(u->committer_info)))
+						BUG("failed splitting committer info");
+				} else {
+					c = committer_ident;
+				}
+
+				fill_reftable_log_record(log, &c);
 				log->update_index = ts;
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v2 2/8] refs: add `index` field to `struct ref_udpate`
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 3/8] refs/files: add count field to ref_lock Karthik Nayak
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reftable backend, sorts its updates by refname before applying them,
this ensures that the references are stored sorted. When migrating
reflogs from one backend to another, the order of the reflogs must be
maintained. Add a new `index` field to the `ref_update` struct to
facilitate this.

This field is used in the reftable backend's sort comparison function
`transaction_update_cmp`, to ensure that indexed fields maintain their
order.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/refs-internal.h    |  7 +++++++
 refs/reftable-backend.c | 13 +++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 0fd95cdacd99e4a728c22f5286f6b3f0f360c110..f5c733d099f0c6f1076a25f4f77d9d5eb345ec87 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -115,6 +115,13 @@ struct ref_update {
 	char *msg;
 	char *committer_info;
 
+	/*
+	 * The index overrides the default sort algorithm. This is needed
+	 * when migrating reflogs and we want to ensure we carry over the
+	 * same order.
+	 */
+	unsigned int index;
+
 	/*
 	 * If this ref_update was split off of a symref update via
 	 * split_symref_update(), then this member points at that
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index e882602487c66261d586a94101bb1b4e9a2ed60e..c008f20be719fec3af6a8f81c821cb9c263764d7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1279,8 +1279,17 @@ static int reftable_be_transaction_abort(struct ref_store *ref_store UNUSED,
 
 static int transaction_update_cmp(const void *a, const void *b)
 {
-	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
-		      ((struct reftable_transaction_update *)b)->update->refname);
+	struct reftable_transaction_update *update_a = (struct reftable_transaction_update *)a;
+	struct reftable_transaction_update *update_b = (struct reftable_transaction_update *)b;
+
+	/*
+	 * If there is an index set, it should take preference (default is 0).
+	 * This ensures that updates with indexes are sorted amongst themselves.
+	 */
+	if (update_a->update->index || update_b->update->index)
+		return update_a->update->index - update_b->update->index;
+
+	return strcmp(update_a->update->refname, update_b->update->refname);
 }
 
 static int write_transaction_table(struct reftable_writer *writer, void *cb_data)

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v2 3/8] refs/files: add count field to ref_lock
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 4/8] refs: extract out refname verification in transactions Karthik Nayak
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

When refs are updated in the files-backend, a lock is obtained for the
corresponding file path. This is the case even for reflogs, i.e. a lock
is obtained on the reference path instead of the reflog path. This
works, since generally, reflogs are updated alongside the ref.

The upcoming patches will add support for reflog updates in ref
transaction. This means, in a particular transaction we want to have ref
updates and reflog updates. For a given ref in a given transaction there
can be at most one update. But we can theoretically have multiple reflog
updates for a given ref in a given transaction. A great example of this
would be when migrating reflogs from one backend to another. There we
would batch all the reflog updates for a given reference in a single
transaction.

The current flow does not support this, because currently refs & reflogs
are treated as a single entity and capture the lock together. To
separate this, add a count field to ref_lock. With this, multiple
updates can hold onto a single ref_lock and the lock will only be
released when all of them release the lock.

This patch only adds the `count` field to `ref_lock` and adds the logic
to increment and decrement the lock. In a follow up commit, we'll
separate the reflog update logic from ref updates and utilize this
functionality.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c | 58 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6078668c99ee254e794e3ba49689aa34e6022efd..02cb4907d8659e87a227fed4f60a5f6606be8764 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -71,6 +71,7 @@ struct ref_lock {
 	char *ref_name;
 	struct lock_file lk;
 	struct object_id old_oid;
+	unsigned int count; /* track users of the lock (ref update + reflog updates) */
 };
 
 struct files_ref_store {
@@ -638,9 +639,12 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 
 static void unlock_ref(struct ref_lock *lock)
 {
-	rollback_lock_file(&lock->lk);
-	free(lock->ref_name);
-	free(lock);
+	lock->count--;
+	if (!lock->count) {
+		rollback_lock_file(&lock->lk);
+		free(lock->ref_name);
+		free(lock);
+	}
 }
 
 /*
@@ -696,6 +700,7 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	*lock_p = CALLOC_ARRAY(lock, 1);
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 	files_ref_path(refs, &ref_file, refname);
 
 retry:
@@ -1169,6 +1174,7 @@ static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
 		goto error_return;
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 
 	if (raceproof_create_file(ref_file.buf, create_reflock, &lock->lk)) {
 		unable_to_lock_message(ref_file.buf, errno, err);
@@ -2535,6 +2541,12 @@ static int check_old_oid(struct ref_update *update, struct object_id *oid,
 	return -1;
 }
 
+struct files_transaction_backend_data {
+	struct ref_transaction *packed_transaction;
+	int packed_refs_locked;
+	struct strmap ref_locks;
+};
+
 /*
  * Prepare for carrying out update:
  * - Lock the reference referred to by update.
@@ -2557,11 +2569,14 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 {
 	struct strbuf referent = STRBUF_INIT;
 	int mustexist = ref_update_expects_existing_old_ref(update);
+	struct files_transaction_backend_data *backend_data;
 	int ret = 0;
 	struct ref_lock *lock;
 
 	files_assert_main_repository(refs, "lock_ref_for_update");
 
+	backend_data = transaction->backend_data;
+
 	if ((update->flags & REF_HAVE_NEW) && ref_update_has_null_new_value(update))
 		update->flags |= REF_DELETING;
 
@@ -2572,18 +2587,25 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			goto out;
 	}
 
-	ret = lock_raw_ref(refs, update->refname, mustexist,
-			   affected_refnames,
-			   &lock, &referent,
-			   &update->type, err);
-	if (ret) {
-		char *reason;
+	lock = strmap_get(&backend_data->ref_locks, update->refname);
+	if (lock) {
+		lock->count++;
+	} else {
+		ret = lock_raw_ref(refs, update->refname, mustexist,
+				   affected_refnames,
+				   &lock, &referent,
+				   &update->type, err);
+		if (ret) {
+			char *reason;
+
+			reason = strbuf_detach(err, NULL);
+			strbuf_addf(err, "cannot lock ref '%s': %s",
+				    ref_update_original_update_refname(update), reason);
+			free(reason);
+			goto out;
+		}
 
-		reason = strbuf_detach(err, NULL);
-		strbuf_addf(err, "cannot lock ref '%s': %s",
-			    ref_update_original_update_refname(update), reason);
-		free(reason);
-		goto out;
+		strmap_put(&backend_data->ref_locks, update->refname, lock);
 	}
 
 	update->backend_data = lock;
@@ -2730,11 +2752,6 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	return ret;
 }
 
-struct files_transaction_backend_data {
-	struct ref_transaction *packed_transaction;
-	int packed_refs_locked;
-};
-
 /*
  * Unlock any references in `transaction` that are still locked, and
  * mark the transaction closed.
@@ -2767,6 +2784,8 @@ static void files_transaction_cleanup(struct files_ref_store *refs,
 		if (backend_data->packed_refs_locked)
 			packed_refs_unlock(refs->packed_ref_store);
 
+		strmap_clear(&backend_data->ref_locks, 0);
+
 		free(backend_data);
 	}
 
@@ -2796,6 +2815,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		goto cleanup;
 
 	CALLOC_ARRAY(backend_data, 1);
+	strmap_init(&backend_data->ref_locks);
 	transaction->backend_data = backend_data;
 
 	/*

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v2 4/8] refs: extract out refname verification in transactions
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
                     ` (2 preceding siblings ...)
  2024-12-13 10:36   ` [PATCH v2 3/8] refs/files: add count field to ref_lock Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 10:36   ` [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
the refname of the update is verified for:

  - Ensuring it is not a pseudoref.
  - Checking the refname format.

These checks will also be needed in a following commit where the
function to add reflog updates to the transaction is introduced. Extract
the code out into a new static function.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801 100644
--- a/refs.c
+++ b/refs.c
@@ -1196,6 +1196,28 @@ struct ref_update *ref_transaction_add_update(
 	return update;
 }
 
+static int transaction_refname_valid(const char *refname,
+				     const struct object_id *new_oid,
+				     unsigned int flags, struct strbuf *err)
+{
+	if (flags & REF_SKIP_REFNAME_VERIFICATION)
+		return 1;
+
+	if (is_pseudo_ref(refname)) {
+		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
+			    refname);
+		return 0;
+	} else if ((new_oid && !is_null_oid(new_oid)) ?
+		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
+		 !refname_is_safe(refname)) {
+		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
+			    refname);
+		return 0;
+	}
+
+	return 1;
+}
+
 int ref_transaction_update(struct ref_transaction *transaction,
 			   const char *refname,
 			   const struct object_id *new_oid,
@@ -1213,21 +1235,8 @@ int ref_transaction_update(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    ((new_oid && !is_null_oid(new_oid)) ?
-		     check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
-			   !refname_is_safe(refname))) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
+	if (!transaction_refname_valid(refname, new_oid, flags, err))
 		return -1;
-	}
-
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
-		return -1;
-	}
 
 	if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
 		BUG("illegal flags 0x%x passed to ref_transaction_update()", flags);

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
                     ` (3 preceding siblings ...)
  2024-12-13 10:36   ` [PATCH v2 4/8] refs: extract out refname verification in transactions Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 12:24     ` Patrick Steinhardt
  2024-12-13 10:36   ` [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `ref_transaction_add_update()` creates the `ref_update` struct. To
facilitate addition of reflogs in the next commit, the function needs to
accommodate setting the `committer_info` field in the struct. So modify
the function to also take `committer_info` as an argument and set it
accordingly.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  9 +++++++--
 refs/files-backend.c    | 14 ++++++++------
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c |  6 ++++--
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801..428ca256f3e5860554e9a7fa42a8368bb2689b31 100644
--- a/refs.c
+++ b/refs.c
@@ -1166,6 +1166,7 @@ struct ref_update *ref_transaction_add_update(
 		const struct object_id *new_oid,
 		const struct object_id *old_oid,
 		const char *new_target, const char *old_target,
+		const char *committer_info,
 		const char *msg)
 {
 	struct ref_update *update;
@@ -1190,8 +1191,12 @@ struct ref_update *ref_transaction_add_update(
 		oidcpy(&update->new_oid, new_oid);
 	if ((flags & REF_HAVE_OLD) && old_oid)
 		oidcpy(&update->old_oid, old_oid);
-	if (!(flags & REF_SKIP_CREATE_REFLOG))
+	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
+		if (committer_info)
+			update->committer_info = xstrdup(committer_info);
+
 		update->msg = normalize_reflog_message(msg);
+	}
 
 	return update;
 }
@@ -1253,7 +1258,7 @@ int ref_transaction_update(struct ref_transaction *transaction,
 
 	ref_transaction_add_update(transaction, refname, flags,
 				   new_oid, old_oid, new_target,
-				   old_target, msg);
+				   old_target, NULL, msg);
 	return 0;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 02cb4907d8659e87a227fed4f60a5f6606be8764..255fed8354cae982f785b1b85340e2a1eeecf2a6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1270,7 +1270,7 @@ static void prune_ref(struct files_ref_store *refs, struct ref_to_prune *r)
 	ref_transaction_add_update(
 			transaction, r->name,
 			REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | REF_IS_PRUNING,
-			null_oid(), &r->oid, NULL, NULL, NULL);
+			null_oid(), &r->oid, NULL, NULL, NULL, NULL);
 	if (ref_transaction_commit(transaction, &err))
 		goto cleanup;
 
@@ -2417,7 +2417,7 @@ static int split_head_update(struct ref_update *update,
 			transaction, "HEAD",
 			update->flags | REF_LOG_ONLY | REF_NO_DEREF,
 			&update->new_oid, &update->old_oid,
-			NULL, NULL, update->msg);
+			NULL, NULL, update->committer_info, update->msg);
 
 	/*
 	 * Add "HEAD". This insertion is O(N) in the transaction
@@ -2481,7 +2481,8 @@ static int split_symref_update(struct ref_update *update,
 			transaction, referent, new_flags,
 			update->new_target ? NULL : &update->new_oid,
 			update->old_target ? NULL : &update->old_oid,
-			update->new_target, update->old_target, update->msg);
+			update->new_target, update->old_target, NULL,
+			update->msg);
 
 	new_update->parent_update = update;
 
@@ -2914,7 +2915,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 					packed_transaction, update->refname,
 					REF_HAVE_NEW | REF_NO_DEREF,
 					&update->new_oid, NULL,
-					NULL, NULL, NULL);
+					NULL, NULL, NULL, NULL);
 		}
 	}
 
@@ -3094,12 +3095,13 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 			ref_transaction_add_update(loose_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,
 						   update->new_target ? NULL : &update->new_oid, NULL,
-						   update->new_target, NULL, NULL);
+						   update->new_target, NULL, update->committer_info,
+						   NULL);
 		} else {
 			ref_transaction_add_update(packed_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,
 						   &update->new_oid, &update->old_oid,
-						   NULL, NULL, NULL);
+						   NULL, NULL, update->committer_info, NULL);
 		}
 	}
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f5c733d099f0c6f1076a25f4f77d9d5eb345ec87..79b287c5ec5c7d8f759869cf93cda405640186dc 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -162,6 +162,7 @@ struct ref_update *ref_transaction_add_update(
 		const struct object_id *new_oid,
 		const struct object_id *old_oid,
 		const char *new_target, const char *old_target,
+		const char *committer_info,
 		const char *msg);
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c008f20be719fec3af6a8f81c821cb9c263764d7..b2e3ba877de9e59fea5a4d066eb13e60ef22a32b 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1078,7 +1078,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			new_update = ref_transaction_add_update(
 					transaction, "HEAD",
 					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
-					&u->new_oid, &u->old_oid, NULL, NULL, u->msg);
+					&u->new_oid, &u->old_oid, NULL, NULL, NULL,
+					u->msg);
 			string_list_insert(&affected_refnames, new_update->refname);
 		}
 
@@ -1161,7 +1162,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 					transaction, referent.buf, new_flags,
 					u->new_target ? NULL : &u->new_oid,
 					u->old_target ? NULL : &u->old_oid,
-					u->new_target, u->old_target, u->msg);
+					u->new_target, u->old_target,
+					u->committer_info, u->msg);
 
 				new_update->parent_update = u;
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
                     ` (4 preceding siblings ...)
  2024-12-13 10:36   ` [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 11:44     ` Christian Couder
  2024-12-13 12:24     ` Patrick Steinhardt
  2024-12-13 10:36   ` [PATCH v2 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
                     ` (2 subsequent siblings)
  8 siblings, 2 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

Introduce a new function `ref_transaction_update_reflog`, for clients to
add a reflog update to a transaction. While the existing function
`ref_transaction_update` also allows clients to add a reflog entry, this
function does a few things more, It:
  - Enforces that only a reflog entry is added and does not update the
  ref itself.
  - Allows the users to also provide the committer information. This
  means clients can add reflog entries with custom committer
  information.

A follow up commit will utilize this function to add reflog support to
`git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c               | 44 ++++++++++++++++++++++++++++++++++++++------
 refs.h               | 12 ++++++++++++
 refs/files-backend.c | 24 ++++++++++++++++--------
 3 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index 428ca256f3e5860554e9a7fa42a8368bb2689b31..9f539369bc94a25594adc3e95847f2fe72f58a08 100644
--- a/refs.c
+++ b/refs.c
@@ -1203,20 +1203,21 @@ struct ref_update *ref_transaction_add_update(
 
 static int transaction_refname_valid(const char *refname,
 				     const struct object_id *new_oid,
-				     unsigned int flags, struct strbuf *err)
+				     unsigned int flags, unsigned int reflog,
+				     struct strbuf *err)
 {
 	if (flags & REF_SKIP_REFNAME_VERIFICATION)
 		return 1;
 
 	if (is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
+		const char *what = reflog ? "reflog for pseudoref" : "pseudoref";
+		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
 		return 0;
 	} else if ((new_oid && !is_null_oid(new_oid)) ?
 		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
 		 !refname_is_safe(refname)) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
+		const char *what = reflog ? "reflog with bad name" : "ref with bad name";
+		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
 		return 0;
 	}
 
@@ -1240,7 +1241,7 @@ int ref_transaction_update(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	if (!transaction_refname_valid(refname, new_oid, flags, err))
+	if (!transaction_refname_valid(refname, new_oid, flags, 0, err))
 		return -1;
 
 	if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
@@ -1259,6 +1260,37 @@ int ref_transaction_update(struct ref_transaction *transaction,
 	ref_transaction_add_update(transaction, refname, flags,
 				   new_oid, old_oid, new_target,
 				   old_target, NULL, msg);
+
+	return 0;
+}
+
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err)
+{
+	struct ref_update *update;
+
+	assert(err);
+
+	if (!transaction_refname_valid(refname, new_oid, flags, 1, err))
+		return -1;
+
+	flags |= REF_LOG_ONLY | REF_NO_DEREF;
+
+	update = ref_transaction_add_update(transaction, refname, flags,
+					    new_oid, old_oid, NULL, NULL,
+					    committer_info, msg);
+	/*
+	 * While we do set the old_oid value, we unset the flag to skip
+	 * old_oid verification which only makes sense for refs.
+	 */
+	update->flags &= ~REF_HAVE_OLD;
+	update->index = index;
+
 	return 0;
 }
 
diff --git a/refs.h b/refs.h
index a5bedf48cf6de91005a7e8d0bf58ca98350397a6..67f8b3eef3f2101409e5cc6eb2241d99e9f7d95c 100644
--- a/refs.h
+++ b/refs.h
@@ -727,6 +727,18 @@ int ref_transaction_update(struct ref_transaction *transaction,
 			   unsigned int flags, const char *msg,
 			   struct strbuf *err);
 
+/*
+ * Similar to`ref_transaction_update`, but this function is only for adding
+ * a reflog update. Supports providing custom committer information.
+ */
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err);
+
 /*
  * Add a reference creation to transaction. new_oid is the value that
  * the reference should have after the update; it must not be
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 255fed8354cae982f785b1b85340e2a1eeecf2a6..c11213f52065bcf2fa7612df8f9500692ee2d02c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3080,10 +3080,12 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 
 		/*
-		 * packed-refs don't support symbolic refs and root refs, so we
-		 * have to queue these references via the loose transaction.
+		 * packed-refs don't support symbolic refs, root refs and reflogs,
+		 * so we have to queue these references via the loose transaction.
 		 */
-		if (update->new_target || is_root_ref(update->refname)) {
+		if (update->new_target ||
+		    is_root_ref(update->refname) ||
+		    (update->flags & REF_LOG_ONLY)) {
 			if (!loose_transaction) {
 				loose_transaction = ref_store_transaction_begin(&refs->base, 0, err);
 				if (!loose_transaction) {
@@ -3092,11 +3094,17 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 				}
 			}
 
-			ref_transaction_add_update(loose_transaction, update->refname,
-						   update->flags & ~REF_HAVE_OLD,
-						   update->new_target ? NULL : &update->new_oid, NULL,
-						   update->new_target, NULL, update->committer_info,
-						   NULL);
+			if (update->flags & REF_LOG_ONLY)
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags, &update->new_oid,
+							   &update->old_oid, NULL, NULL,
+							   update->committer_info, update->msg);
+			else
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags & ~REF_HAVE_OLD,
+							   update->new_target ? NULL : &update->new_oid, NULL,
+							   update->new_target, NULL, update->committer_info,
+							   NULL);
 		} else {
 			ref_transaction_add_update(packed_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v2 7/8] refs: allow multiple reflog entries for the same refname
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
                     ` (5 preceding siblings ...)
  2024-12-13 10:36   ` [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 12:24     ` Patrick Steinhardt
  2024-12-13 10:36   ` [PATCH v2 8/8] refs: add support for migrating reflogs Karthik Nayak
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
  8 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reference transaction only allows a single update for a given
reference to avoid conflicts. This, however, isn't an issue for reflogs.
There are no conflicts to be resolved in reflogs and when migrating
reflogs between backends we'd have multiple reflog entries for the same
refname.

So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.

In the reftable backend, the writer sorts all updates based on the
update_index before writing to the block. When there are multiple
reflogs for a given refname, it is essential that the order of the
reflogs is maintained. So add the `index` value to the `update_index`.
The `index` field is only be set when multiple reflog entries for a
given refname are added and as such in most scenarios the old behavior
remains.

This is required to add reflog migration support to `git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c    | 15 +++++++++++----
 refs/reftable-backend.c | 22 +++++++++++++++++++---
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
+	if (update->flags & REF_LOG_ONLY)
+		goto out;
+
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
@@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	 */
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *update = transaction->updates[i];
-		struct string_list_item *item =
-			string_list_append(&affected_refnames, update->refname);
+		struct string_list_item *item;
 
 		if ((update->flags & REF_IS_PRUNING) &&
 		    !(update->flags & REF_NO_DEREF))
 			BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
+		if (update->flags & REF_LOG_ONLY)
+			continue;
+
+		item = string_list_append(&affected_refnames, update->refname);
 		/*
 		 * We store a pointer to update in item->util, but at
 		 * the moment we never use the value of this field
@@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 
 	/* Fail if a refname appears more than once in the transaction: */
 	for (i = 0; i < transaction->nr; i++)
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	string_list_sort(&affected_refnames);
 	if (ref_update_reject_duplicates(&affected_refnames, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		if (ret)
 			goto done;
 
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	}
 
 	/*
@@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
+	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				}
 
 				fill_reftable_log_record(log, &c);
-				log->update_index = ts;
+
+				/*
+				 * Updates are sorted by the writer. So updates for the same
+				 * refname need to contain different update indices.
+				 */
+				log->update_index = ts + u->index;
+
+				/*
+				 * Note the max update_index so the limit can be set later on.
+				 */
+				if (log->update_index > max_update_index)
+					max_update_index = log->update_index;
+
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
+		reftable_writer_set_limits(writer, ts, max_update_index);
+
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v2 8/8] refs: add support for migrating reflogs
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
                     ` (6 preceding siblings ...)
  2024-12-13 10:36   ` [PATCH v2 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
@ 2024-12-13 10:36   ` Karthik Nayak
  2024-12-13 12:24     ` Patrick Steinhardt
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
  8 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-13 10:36 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.

To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-refs.txt |  2 --
 refs.c                     | 89 ++++++++++++++++++++++++++++++++--------------
 t/t1460-refs-migrate.sh    | 73 +++++++++++++++++++++++++------------
 3 files changed, 113 insertions(+), 51 deletions(-)

diff --git a/Documentation/git-refs.txt b/Documentation/git-refs.txt
index ce31f93061db5e5d16aca516dd3d15f6527db870..9829984b0a4c4f54ec7f9b6c6c7072f62b1d198d 100644
--- a/Documentation/git-refs.txt
+++ b/Documentation/git-refs.txt
@@ -57,8 +57,6 @@ KNOWN LIMITATIONS
 
 The ref format migration has several known limitations in its current form:
 
-* It is not possible to migrate repositories that have reflogs.
-
 * It is not possible to migrate repositories that have worktrees.
 
 * There is no way to block concurrent writes to the repository during an
diff --git a/refs.c b/refs.c
index 9f539369bc94a25594adc3e95847f2fe72f58a08..f19292d50f0003881220e8f7cfcf6c7eb4b2e749 100644
--- a/refs.c
+++ b/refs.c
@@ -30,6 +30,7 @@
 #include "date.h"
 #include "commit.h"
 #include "wildmatch.h"
+#include "ident.h"
 
 /*
  * List of all available backends
@@ -2673,6 +2674,7 @@ int ref_update_check_old_target(const char *referent, struct ref_update *update,
 }
 
 struct migration_data {
+	unsigned int reflog_index;
 	struct ref_store *old_refs;
 	struct ref_transaction *transaction;
 	struct strbuf *errbuf;
@@ -2708,6 +2710,53 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
 	return ret;
 }
 
+struct reflog_migration_data {
+	unsigned int *index;
+	const char *refname;
+	struct ref_store *old_refs;
+	struct ref_transaction *transaction;
+	struct strbuf *errbuf;
+};
+
+static int migrate_one_reflog_entry(struct object_id *old_oid,
+				    struct object_id *new_oid,
+				    const char *committer,
+				    timestamp_t timestamp, int tz,
+				    const char *msg, void *cb_data)
+{
+	struct reflog_migration_data *data = cb_data;
+	struct strbuf sb = STRBUF_INIT;
+	const char *date;
+	int ret;
+
+	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
+	/* committer contains name and email */
+	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
+
+	ret = ref_transaction_update_reflog(data->transaction, data->refname,
+					    new_oid, old_oid, sb.buf,
+					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
+					    (*data->index)++, data->errbuf);
+	strbuf_release(&sb);
+
+	return ret;
+}
+
+static int migrate_one_reflog(const char *refname, void *cb_data)
+{
+	struct migration_data *migration_data = cb_data;
+	struct reflog_migration_data data;
+
+	data.refname = refname;
+	data.old_refs = migration_data->old_refs;
+	data.transaction = migration_data->transaction;
+	data.errbuf = migration_data->errbuf;
+	data.index = &migration_data->reflog_index;
+
+	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
+					migrate_one_reflog_entry, &data);
+}
+
 static int move_files(const char *from_path, const char *to_path, struct strbuf *errbuf)
 {
 	struct strbuf from_buf = STRBUF_INIT, to_buf = STRBUF_INIT;
@@ -2774,13 +2823,6 @@ static int move_files(const char *from_path, const char *to_path, struct strbuf
 	return ret;
 }
 
-static int count_reflogs(const char *reflog UNUSED, void *payload)
-{
-	size_t *reflog_count = payload;
-	(*reflog_count)++;
-	return 0;
-}
-
 static int has_worktrees(void)
 {
 	struct worktree **worktrees = get_worktrees();
@@ -2806,7 +2848,6 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	struct ref_transaction *transaction = NULL;
 	struct strbuf new_gitdir = STRBUF_INIT;
 	struct migration_data data;
-	size_t reflog_count = 0;
 	int did_migrate_refs = 0;
 	int ret;
 
@@ -2818,21 +2859,6 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 
 	old_refs = get_main_ref_store(repo);
 
-	/*
-	 * We do not have any interfaces that would allow us to write many
-	 * reflog entries. Once we have them we can remove this restriction.
-	 */
-	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
-		strbuf_addstr(errbuf, "cannot count reflogs");
-		ret = -1;
-		goto done;
-	}
-	if (reflog_count) {
-		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
-		ret = -1;
-		goto done;
-	}
-
 	/*
 	 * Worktrees complicate the migration because every worktree has a
 	 * separate ref storage. While it should be feasible to implement, this
@@ -2858,17 +2884,21 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	 *      This operation is safe as we do not yet modify the main
 	 *      repository.
 	 *
-	 *   3. If we're in dry-run mode then we are done and can hand over the
+	 *   3. Enumerate all reflogs and write them into the new ref storage.
+	 *      This operation is safe as we do not yet modify the main
+	 *      repository.
+	 *
+	 *   4. If we're in dry-run mode then we are done and can hand over the
 	 *      directory to the caller for inspection. If not, we now start
 	 *      with the destructive part.
 	 *
-	 *   4. Delete the old ref storage from disk. As we have a copy of refs
+	 *   5. Delete the old ref storage from disk. As we have a copy of refs
 	 *      in the new ref storage it's okay(ish) if we now get interrupted
 	 *      as there is an equivalent copy of all refs available.
 	 *
-	 *   5. Move the new ref storage files into place.
+	 *   6. Move the new ref storage files into place.
 	 *
-	 *   6. Change the repository format to the new ref format.
+	 *  7. Change the repository format to the new ref format.
 	 */
 	strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
 	if (!mkdtemp(new_gitdir.buf)) {
@@ -2910,6 +2940,11 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	if (ret < 0)
 		goto done;
 
+	data.reflog_index = 1;
+	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
+	if (ret < 0)
+		goto done;
+
 	ret = ref_transaction_commit(transaction, errbuf);
 	if (ret < 0)
 		goto done;
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index 1bfff3a7afd5acc470424dfe7ec3e97d45f5c481..f59bc4860f19c4af82dc6f2984bdb69d61fe3ec2 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -7,23 +7,44 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
 
+# Migrate the provided repository from one format to the other and
+# verify that the references and logs are migrated over correctly.
+# Usage: test_migration <repo> <format> <skip_reflog_verify>
+#   <repo> is the relative path to the repo to be migrated.
+#   <format> is the ref format to be migrated to.
+#   <skip_reflog_verify> (true or false) whether to skip reflog verification.
 test_migration () {
-	git -C "$1" for-each-ref --include-root-refs \
+	repo=$1 &&
+	format=$2 &&
+	skip_reflog_verify=${3:-false} &&
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >expect &&
-	git -C "$1" refs migrate --ref-format="$2" &&
-	git -C "$1" for-each-ref --include-root-refs \
+	if ! $skip_reflog_verify
+	then
+	   git -C "$repo" reflog --all >expect_logs &&
+	   git -C "$repo" reflog list >expect_log_list
+	fi &&
+
+	git -C "$repo" refs migrate --ref-format="$2" &&
+
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >actual &&
 	test_cmp expect actual &&
+	if ! $skip_reflog_verify
+	then
+		git -C "$repo" reflog --all >actual_logs &&
+		git -C "$repo" reflog list >actual_log_list &&
+		test_cmp expect_logs actual_logs &&
+		test_cmp expect_log_list actual_log_list
+	fi &&
 
-	git -C "$1" rev-parse --show-ref-format >actual &&
-	echo "$2" >expect &&
+	git -C "$repo" rev-parse --show-ref-format >actual &&
+	echo "$format" >expect &&
 	test_cmp expect actual
 }
 
 test_expect_success 'setup' '
-	rm -rf .git &&
-	# The migration does not yet support reflogs.
-	git config --global core.logAllRefUpdates false
+	rm -rf .git
 '
 
 test_expect_success "superfluous arguments" '
@@ -78,19 +99,6 @@ do
 			test_cmp expect err
 		'
 
-		test_expect_success "$from_format -> $to_format: migration with reflog fails" '
-			test_when_finished "rm -rf repo" &&
-			git init --ref-format=$from_format repo &&
-			test_config -C repo core.logAllRefUpdates true &&
-			test_commit -C repo logged &&
-			test_must_fail git -C repo refs migrate \
-				--ref-format=$to_format 2>err &&
-			cat >expect <<-EOF &&
-			error: migrating reflogs is not supported yet
-			EOF
-			test_cmp expect err
-		'
-
 		test_expect_success "$from_format -> $to_format: migration with worktree fails" '
 			test_when_finished "rm -rf repo" &&
 			git init --ref-format=$from_format repo &&
@@ -141,7 +149,7 @@ do
 			test_commit -C repo initial &&
 			test-tool -C repo ref-store main update-ref "" refs/heads/broken \
 				"$(test_oid 001)" "$ZERO_OID" REF_SKIP_CREATE_REFLOG,REF_SKIP_OID_VERIFICATION &&
-			test_migration repo "$to_format" &&
+			test_migration repo "$to_format" true &&
 			test_oid 001 >expect &&
 			git -C repo rev-parse refs/heads/broken >actual &&
 			test_cmp expect actual
@@ -195,6 +203,27 @@ do
 			git -C repo rev-parse --show-ref-format >actual &&
 			test_cmp expect actual
 		'
+
+		test_expect_success "$from_format -> $to_format: reflogs of symrefs with target deleted" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit -C repo initial &&
+			git -C repo branch branch-1 HEAD &&
+			git -C repo symbolic-ref refs/heads/symref refs/heads/branch-1 &&
+			cat >input <<-EOF &&
+			delete refs/heads/branch-1
+			EOF
+			git -C repo update-ref --stdin <input &&
+			test_migration repo "$to_format"
+		'
+
+		test_expect_success "$from_format -> $to_format: reflogs order is retained" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit --date "100005000 +0700" --no-tag -C repo initial &&
+			test_commit --date "100003000 +0700" --no-tag -C repo second &&
+			test_migration repo "$to_format"
+		'
 	done
 done
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-13 10:36   ` [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
@ 2024-12-13 11:44     ` Christian Couder
  2024-12-13 19:49       ` karthik nayak
  2024-12-13 12:24     ` Patrick Steinhardt
  1 sibling, 1 reply; 93+ messages in thread
From: Christian Couder @ 2024-12-13 11:44 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, Christian Couder

On Fri, Dec 13, 2024 at 11:36 AM Karthik Nayak <karthik.188@gmail.com> wrote:

> +int ref_transaction_update_reflog(struct ref_transaction *transaction,
> +                                 const char *refname,
> +                                 const struct object_id *new_oid,
> +                                 const struct object_id *old_oid,
> +                                 const char *committer_info, unsigned int flags,
> +                                 const char *msg, unsigned int index,
> +                                 struct strbuf *err)
> +{
> +       struct ref_update *update;
> +
> +       assert(err);
> +
> +       if (!transaction_refname_valid(refname, new_oid, flags, 1, err))
> +               return -1;
> +
> +       flags |= REF_LOG_ONLY | REF_NO_DEREF;

If we could switch the above lines like this:

      flags |= REF_LOG_ONLY | REF_NO_DEREF;

      if (!transaction_refname_valid(refname, new_oid, flags, 1, err))
               return -1;

maybe we wouldn't need transaction_refname_valid() to take an
'unsigned int reflog' argument and we could instead use 'flags &
REF_LOG_ONLY' inside that function?

> +       update = ref_transaction_add_update(transaction, refname, flags,
> +                                           new_oid, old_oid, NULL, NULL,
> +                                           committer_info, msg);
> +       /*
> +        * While we do set the old_oid value, we unset the flag to skip
> +        * old_oid verification which only makes sense for refs.
> +        */
> +       update->flags &= ~REF_HAVE_OLD;
> +       update->index = index;
> +
>         return 0;
>  }

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-13 10:36   ` [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
@ 2024-12-13 12:24     ` Patrick Steinhardt
  2024-12-13 19:43       ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-13 12:24 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Christian Couder

On Fri, Dec 13, 2024 at 11:36:50AM +0100, Karthik Nayak wrote:
> The `ref_transaction_add_update()` creates the `ref_update` struct. To
> facilitate addition of reflogs in the next commit, the function needs to
> accommodate setting the `committer_info` field in the struct. So modify
> the function to also take `committer_info` as an argument and set it
> accordingly.

I was wondering a bit whether we could instead pull out a
`add_update_internal()` function so that we don't need to modify all
callers of `ref_transaction_add_update()`. Because ultimately, we don't
use the field anywhere except from `ref_transaction_add_reflog_update()`
as far as I can see.

This is more of a thought than a strong opinion, so feel free to ignore.

> @@ -1190,8 +1191,12 @@ struct ref_update *ref_transaction_add_update(
>  		oidcpy(&update->new_oid, new_oid);
>  	if ((flags & REF_HAVE_OLD) && old_oid)
>  		oidcpy(&update->old_oid, old_oid);
> -	if (!(flags & REF_SKIP_CREATE_REFLOG))
> +	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
> +		if (committer_info)
> +			update->committer_info = xstrdup(committer_info);
> +
>  		update->msg = normalize_reflog_message(msg);
> +	}
>  
>  	return update;
>  }

This can use `xstrdup_or_null()` and then drop the condition.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-13 10:36   ` [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
  2024-12-13 11:44     ` Christian Couder
@ 2024-12-13 12:24     ` Patrick Steinhardt
  1 sibling, 0 replies; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-13 12:24 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Christian Couder

On Fri, Dec 13, 2024 at 11:36:51AM +0100, Karthik Nayak wrote:
> diff --git a/refs.h b/refs.h
> index a5bedf48cf6de91005a7e8d0bf58ca98350397a6..67f8b3eef3f2101409e5cc6eb2241d99e9f7d95c 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -727,6 +727,18 @@ int ref_transaction_update(struct ref_transaction *transaction,
>  			   unsigned int flags, const char *msg,
>  			   struct strbuf *err);
>  
> +/*
> + * Similar to`ref_transaction_update`, but this function is only for adding
> + * a reflog update. Supports providing custom committer information.
> + */
> +int ref_transaction_update_reflog(struct ref_transaction *transaction,
> +				  const char *refname,
> +				  const struct object_id *new_oid,
> +				  const struct object_id *old_oid,
> +				  const char *committer_info, unsigned int flags,
> +				  const char *msg, unsigned int index,
> +				  struct strbuf *err);
> +

Nit: it would be great to explain what the index does in the doc, as it
is completely non-obvious.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 7/8] refs: allow multiple reflog entries for the same refname
  2024-12-13 10:36   ` [PATCH v2 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
@ 2024-12-13 12:24     ` Patrick Steinhardt
  2024-12-13 20:02       ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-13 12:24 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Christian Couder

On Fri, Dec 13, 2024 at 11:36:52AM +0100, Karthik Nayak wrote:
> The reference transaction only allows a single update for a given
> reference to avoid conflicts. This, however, isn't an issue for reflogs.
> There are no conflicts to be resolved in reflogs and when migrating
> reflogs between backends we'd have multiple reflog entries for the same
> refname.
> 
> So allow multiple reflog updates within a single transaction. Also the
> reflog creation logic isn't exposed to the end user. While this might
> change in the future, currently, this reduces the scope of issues to
> think about.
> 
> In the reftable backend, the writer sorts all updates based on the
> update_index before writing to the block. When there are multiple
> reflogs for a given refname, it is essential that the order of the
> reflogs is maintained. So add the `index` value to the `update_index`.
> The `index` field is only be set when multiple reflog entries for a

s/only be/only

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>  				}
>  
>  				fill_reftable_log_record(log, &c);
> -				log->update_index = ts;
> +
> +				/*
> +				 * Updates are sorted by the writer. So updates for the same
> +				 * refname need to contain different update indices.
> +				 */
> +				log->update_index = ts + u->index;

Okay. So instead of tracking things via a map, we now rely on the caller
to provide the update index. And if they don't provide one then we
cannot guarantee ordering.

I guess that's a good solution. After all, there will only be a very
limited amount of callers in the first place, so I think it's fine to
shift the responsibility onto them to maintain reflog ordering. They're
also the only ones who really know about the actual ordering.

> +				/*
> +				 * Note the max update_index so the limit can be set later on.
> +				 */
> +				if (log->update_index > max_update_index)
> +					max_update_index = log->update_index;

Makes sense, as well.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 8/8] refs: add support for migrating reflogs
  2024-12-13 10:36   ` [PATCH v2 8/8] refs: add support for migrating reflogs Karthik Nayak
@ 2024-12-13 12:24     ` Patrick Steinhardt
  2024-12-15 11:09       ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-13 12:24 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Christian Couder

On Fri, Dec 13, 2024 at 11:36:53AM +0100, Karthik Nayak wrote:
> diff --git a/refs.c b/refs.c
> index 9f539369bc94a25594adc3e95847f2fe72f58a08..f19292d50f0003881220e8f7cfcf6c7eb4b2e749 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2708,6 +2710,53 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
>  	return ret;
>  }
>  
> +struct reflog_migration_data {
> +	unsigned int *index;
> +	const char *refname;
> +	struct ref_store *old_refs;
> +	struct ref_transaction *transaction;
> +	struct strbuf *errbuf;
> +};
> +
> +static int migrate_one_reflog_entry(struct object_id *old_oid,
> +				    struct object_id *new_oid,
> +				    const char *committer,
> +				    timestamp_t timestamp, int tz,
> +				    const char *msg, void *cb_data)
> +{
> +	struct reflog_migration_data *data = cb_data;
> +	struct strbuf sb = STRBUF_INIT;
> +	const char *date;
> +	int ret;
> +
> +	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
> +	/* committer contains name and email */
> +	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
> +
> +	ret = ref_transaction_update_reflog(data->transaction, data->refname,
> +					    new_oid, old_oid, sb.buf,
> +					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
> +					    (*data->index)++, data->errbuf);

This is where we now increment the reflog index to ensure a proper
ordering.

> +	strbuf_release(&sb);
> +
> +	return ret;
> +}

We're now allocating one buffer per reflog entry. We may want to
optimize this by having a scratch buffer in `migration_data`, which we
could then pass on via `reflog_migration_data`.

> @@ -2910,6 +2940,11 @@ int repo_migrate_ref_storage_format(struct repository *repo,
>  	if (ret < 0)
>  		goto done;
>  
> +	data.reflog_index = 1;

I'm a bit surprised that we initialize the relfog entry here, because
that means we now have a globally increasing counter across all reflogs.
Couldn't we initialize the index per reflog instead? It ultimately does
not really matter, but feels like the more obvious design to me

Also, is there any specific reason why we start at 1 and not 0? Just curious.

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-13 12:24     ` Patrick Steinhardt
@ 2024-12-13 19:43       ` karthik nayak
  2024-12-19 19:31         ` Toon Claes
  0 siblings, 1 reply; 93+ messages in thread
From: karthik nayak @ 2024-12-13 19:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Dec 13, 2024 at 11:36:50AM +0100, Karthik Nayak wrote:
>> The `ref_transaction_add_update()` creates the `ref_update` struct. To
>> facilitate addition of reflogs in the next commit, the function needs to
>> accommodate setting the `committer_info` field in the struct. So modify
>> the function to also take `committer_info` as an argument and set it
>> accordingly.
>
> I was wondering a bit whether we could instead pull out a
> `add_update_internal()` function so that we don't need to modify all
> callers of `ref_transaction_add_update()`. Because ultimately, we don't
> use the field anywhere except from `ref_transaction_add_reflog_update()`
> as far as I can see.
>
> This is more of a thought than a strong opinion, so feel free to ignore.
>

Yes, that is a possible change, but the number of code changes are
relatively low and I didn't think it made so much difference. Also
because we'd now have one more function. But I don't mind doing it
either, if anyone feels strongly about it, I'll happily make that
change.

>> @@ -1190,8 +1191,12 @@ struct ref_update *ref_transaction_add_update(
>>  		oidcpy(&update->new_oid, new_oid);
>>  	if ((flags & REF_HAVE_OLD) && old_oid)
>>  		oidcpy(&update->old_oid, old_oid);
>> -	if (!(flags & REF_SKIP_CREATE_REFLOG))
>> +	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
>> +		if (committer_info)
>> +			update->committer_info = xstrdup(committer_info);
>> +
>>  		update->msg = normalize_reflog_message(msg);
>> +	}
>>
>>  	return update;
>>  }
>
> This can use `xstrdup_or_null()` and then drop the condition.
>
> Patrick

That is good improvement, will add.

Thanks!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-13 11:44     ` Christian Couder
@ 2024-12-13 19:49       ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-13 19:49 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2149 bytes --]

Christian Couder <christian.couder@gmail.com> writes:

> On Fri, Dec 13, 2024 at 11:36 AM Karthik Nayak <karthik.188@gmail.com> wrote:
>
>> +int ref_transaction_update_reflog(struct ref_transaction *transaction,
>> +                                 const char *refname,
>> +                                 const struct object_id *new_oid,
>> +                                 const struct object_id *old_oid,
>> +                                 const char *committer_info, unsigned int flags,
>> +                                 const char *msg, unsigned int index,
>> +                                 struct strbuf *err)
>> +{
>> +       struct ref_update *update;
>> +
>> +       assert(err);
>> +
>> +       if (!transaction_refname_valid(refname, new_oid, flags, 1, err))
>> +               return -1;
>> +
>> +       flags |= REF_LOG_ONLY | REF_NO_DEREF;
>
> If we could switch the above lines like this:
>
>       flags |= REF_LOG_ONLY | REF_NO_DEREF;
>
>       if (!transaction_refname_valid(refname, new_oid, flags, 1, err))
>                return -1;
>
> maybe we wouldn't need transaction_refname_valid() to take an
> 'unsigned int reflog' argument and we could instead use 'flags &
> REF_LOG_ONLY' inside that function?
>

The issue is the that this changes existing behavior, since
`ref_transaction_update()` can also be called with the `REF_LOG_ONLY`
flag set.

But, I think it is a worthwhile change, because earlier even for reflog
updates we would show 'refusing to update ref ...' as the message. I'll
add this in and also make a note in the commit message.

Thanks

>> +       update = ref_transaction_add_update(transaction, refname, flags,
>> +                                           new_oid, old_oid, NULL, NULL,
>> +                                           committer_info, msg);
>> +       /*
>> +        * While we do set the old_oid value, we unset the flag to skip
>> +        * old_oid verification which only makes sense for refs.
>> +        */
>> +       update->flags &= ~REF_HAVE_OLD;
>> +       update->index = index;
>> +
>>         return 0;
>>  }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 7/8] refs: allow multiple reflog entries for the same refname
  2024-12-13 12:24     ` Patrick Steinhardt
@ 2024-12-13 20:02       ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-13 20:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2599 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Dec 13, 2024 at 11:36:52AM +0100, Karthik Nayak wrote:
>> The reference transaction only allows a single update for a given
>> reference to avoid conflicts. This, however, isn't an issue for reflogs.
>> There are no conflicts to be resolved in reflogs and when migrating
>> reflogs between backends we'd have multiple reflog entries for the same
>> refname.
>>
>> So allow multiple reflog updates within a single transaction. Also the
>> reflog creation logic isn't exposed to the end user. While this might
>> change in the future, currently, this reduces the scope of issues to
>> think about.
>>
>> In the reftable backend, the writer sorts all updates based on the
>> update_index before writing to the block. When there are multiple
>> reflogs for a given refname, it is essential that the order of the
>> reflogs is maintained. So add the `index` value to the `update_index`.
>> The `index` field is only be set when multiple reflog entries for a
>
> s/only be/only

Thanks.

>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>  				}
>>
>>  				fill_reftable_log_record(log, &c);
>> -				log->update_index = ts;
>> +
>> +				/*
>> +				 * Updates are sorted by the writer. So updates for the same
>> +				 * refname need to contain different update indices.
>> +				 */
>> +				log->update_index = ts + u->index;
>
> Okay. So instead of tracking things via a map, we now rely on the caller
> to provide the update index. And if they don't provide one then we
> cannot guarantee ordering.
>

Which works, because for reflog migration, the caller _has_ to specify
ordering.

> I guess that's a good solution. After all, there will only be a very
> limited amount of callers in the first place, so I think it's fine to
> shift the responsibility onto them to maintain reflog ordering. They're
> also the only ones who really know about the actual ordering.
>

Exactly. Currently this is entirely and only used in the migration code.
Perhaps we expose reflog addition to users, but that is not something
being pursued.

>> +				/*
>> +				 * Note the max update_index so the limit can be set later on.
>> +				 */
>> +				if (log->update_index > max_update_index)
>> +					max_update_index = log->update_index;
>
> Makes sense, as well.
>
> Patrick

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 8/8] refs: add support for migrating reflogs
  2024-12-13 12:24     ` Patrick Steinhardt
@ 2024-12-15 11:09       ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-15 11:09 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2621 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Dec 13, 2024 at 11:36:53AM +0100, Karthik Nayak wrote:
>> diff --git a/refs.c b/refs.c
>> index 9f539369bc94a25594adc3e95847f2fe72f58a08..f19292d50f0003881220e8f7cfcf6c7eb4b2e749 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -2708,6 +2710,53 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
>>  	return ret;
>>  }
>>
>> +struct reflog_migration_data {
>> +	unsigned int *index;
>> +	const char *refname;
>> +	struct ref_store *old_refs;
>> +	struct ref_transaction *transaction;
>> +	struct strbuf *errbuf;
>> +};
>> +
>> +static int migrate_one_reflog_entry(struct object_id *old_oid,
>> +				    struct object_id *new_oid,
>> +				    const char *committer,
>> +				    timestamp_t timestamp, int tz,
>> +				    const char *msg, void *cb_data)
>> +{
>> +	struct reflog_migration_data *data = cb_data;
>> +	struct strbuf sb = STRBUF_INIT;
>> +	const char *date;
>> +	int ret;
>> +
>> +	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
>> +	/* committer contains name and email */
>> +	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
>> +
>> +	ret = ref_transaction_update_reflog(data->transaction, data->refname,
>> +					    new_oid, old_oid, sb.buf,
>> +					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
>> +					    (*data->index)++, data->errbuf);
>
> This is where we now increment the reflog index to ensure a proper
> ordering.
>
>> +	strbuf_release(&sb);
>> +
>> +	return ret;
>> +}
>
> We're now allocating one buffer per reflog entry. We may want to
> optimize this by having a scratch buffer in `migration_data`, which we
> could then pass on via `reflog_migration_data`.
>

That makes sense, let me do that.

>> @@ -2910,6 +2940,11 @@ int repo_migrate_ref_storage_format(struct repository *repo,
>>  	if (ret < 0)
>>  		goto done;
>>
>> +	data.reflog_index = 1;
>
> I'm a bit surprised that we initialize the relfog entry here, because
> that means we now have a globally increasing counter across all reflogs.
> Couldn't we initialize the index per reflog instead? It ultimately does
> not really matter, but feels like the more obvious design to me

Yes, this was needed cause I initially didn't understand how the
udpate_index worked and assumed two logs couldn't have the same
update_index. I missed changing it, like you said, it works, but I'll
fix it.

> Also, is there any specific reason why we start at 1 and not 0? Just curious.

Not really, I wanted to distinguish between index entries vs non-indexed
entries. But logically, no, I'll remove it, to remove any confusion.

> Patrick

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v3 0/8] refs: add reflog support to `git refs migrate`
  2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
                     ` (7 preceding siblings ...)
  2024-12-13 10:36   ` [PATCH v2 8/8] refs: add support for migrating reflogs Karthik Nayak
@ 2024-12-15 16:25   ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
                       ` (9 more replies)
  8 siblings, 10 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the feature was that it didn't support migrating
repositories which contained reflogs. This isn't a requirement on the
server side as repositories are stored as bare repositories (which do
not contain any reflogs). Clients however generally use reflogs and
until now couldn't use the `git refs migrate` command to migrate their
repositories to the new reftable format.

One of the issues for adding reflog support is that the ref transactions
don't support reflogs additions:
  1. While there is REF_LOG_ONLY flag, there is no function to utilize
  the flag and add reflogs.
  2. reference backends generally sort the updates by the refname. This
  wouldn't work for reflogs which need to ensure that they maintain the
  order of creation.
  3. In the files backend, reflog entries are added by obtaining locks
  on the refs themselves. This means each update in the transaction, will
  obtain a ref_lock. This paradigm fails to accompany the fact that there
  could be multiple reflog updates for a refname in a single transaction.
  4. The backends check for duplicate entries, which doesn't make sense
  in the context of adding multiple reflogs for a given refname.

We overcome these issue we make the following changes:
  - Update the ref_update structure to also include the committer
  information. Using this, we can add a new function which only adds
  reflog updates to the transaction.
  - Add an index field to the ref_update structure, this will help order
  updates in pre-defined order, this fixes #2.
  - While the ideal fix for #3 would be to actually introduce reflog
  locks, this wouldn't be possible without breaking backward
  compatibility. So we add a count field to the existing ref_lock. With
  this, multiple reflog updates can share a single ref_lock.

Overall, this series is a bit more involved, and I would appreciate it
if it receives a bit more scrutiny.

The series is based on top of e66fd72e97 (The fourteenth batch,
2024-12-06) with `kn/reftable-writer-log-write-verify` merged in.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v3:
- patch 5: Use `xstrdup_or_null` unconditionally. 
- patch 6: In `transaction_refname_valid()` use the transaction flags
  to identify reflogs. Update the documentation to also mention the
  purpose of the `index` field.
- patch 8: Instead of allocating an strbuf for each reflog entry, we 
  store and re-use one in the migration callback data.
- patch 8: Don't use a global index increment for all reflogs entries,
  instead create and use one per reflog.
- patch 8: Avoid setting the first reflog index to `1`. This would default
  to `0` as the first index, which is okay, since the index is incremented
  for consequtive reflog entries.
- Small typo fixes. 
- Thanks to Christian and Patrick for the review!  
- Link to v2: https://lore.kernel.org/all/20241213-320-git-refs-migrate-reflogs-v2-0-f28312cdb6c0@gmail.com/

Changes in v2:
- Split patch 5 into two separate patches. This should make it easier to
  review and reduce cognitive load in a single patch.
- In reftable backend, instead of using `strmapint` to ensure we have
  new update_indexes for reflogs with the same refname, we now use the
  already available `update->index` field to increment the update_index.
- Cleanup the code and follow some of the better practices.
- Add some clarity to the commit messages.
- Link to v1: https://lore.kernel.org/r/20241209-320-git-refs-migrate-reflogs-v1-0-d4bc37ee860f@gmail.com

---
Karthik Nayak (8):
      refs: include committer info in `ref_update` struct
      refs: add `index` field to `struct ref_udpate`
      refs/files: add count field to ref_lock
      refs: extract out refname verification in transactions
      refs: add `committer_info` to `ref_transaction_add_update()`
      refs: introduce the `ref_transaction_update_reflog` function
      refs: allow multiple reflog entries for the same refname
      refs: add support for migrating reflogs

 Documentation/git-refs.txt |   2 -
 refs.c                     | 165 +++++++++++++++++++++++++++++++++------------
 refs.h                     |  14 ++++
 refs/files-backend.c       | 131 ++++++++++++++++++++++-------------
 refs/refs-internal.h       |   9 +++
 refs/reftable-backend.c    |  53 ++++++++++++---
 t/t1460-refs-migrate.sh    |  73 ++++++++++++++------
 7 files changed, 327 insertions(+), 120 deletions(-)
---

Range-diff versus v2:

1:  d9bd20468c = 1:  d1c06e34bf refs: include committer info in `ref_update` struct
2:  8478eeac95 = 2:  33e65a965a refs: add `index` field to `struct ref_udpate`
3:  913fd320f0 = 3:  34c5beccb6 refs/files: add count field to ref_lock
4:  66b86b5807 = 4:  4fceb64954 refs: extract out refname verification in transactions
5:  33ad1774d4 ! 5:  3c5abd0047 refs: add `committer_info` to `ref_transaction_add_update()`
    @@ refs.c: struct ref_update *ref_transaction_add_update(
      		oidcpy(&update->old_oid, old_oid);
     -	if (!(flags & REF_SKIP_CREATE_REFLOG))
     +	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
    -+		if (committer_info)
    -+			update->committer_info = xstrdup(committer_info);
    -+
    ++		update->committer_info = xstrdup_or_null(committer_info);
      		update->msg = normalize_reflog_message(msg);
     +	}
      
6:  cdbb15b11a ! 6:  9e12f16b96 refs: introduce the `ref_transaction_update_reflog` function
    @@ Commit message
           means clients can add reflog entries with custom committer
           information.
     
    +    The `transaction_refname_valid()` function also modifies the error
    +    message selectively based on the type of the update. This change also
    +    affects reflog updates which go through `ref_transaction_update()`.
    +
         A follow up commit will utilize this function to add reflog support to
         `git refs migrate`.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## refs.c ##
    -@@ refs.c: struct ref_update *ref_transaction_add_update(
    - 
    - static int transaction_refname_valid(const char *refname,
    - 				     const struct object_id *new_oid,
    --				     unsigned int flags, struct strbuf *err)
    -+				     unsigned int flags, unsigned int reflog,
    -+				     struct strbuf *err)
    - {
    - 	if (flags & REF_SKIP_REFNAME_VERIFICATION)
    +@@ refs.c: static int transaction_refname_valid(const char *refname,
      		return 1;
      
      	if (is_pseudo_ref(refname)) {
     -		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
     -			    refname);
    -+		const char *what = reflog ? "reflog for pseudoref" : "pseudoref";
    ++		const char *what = flags & REF_LOG_ONLY ? "reflog for pseudoref" : "pseudoref";
     +		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
      		return 0;
      	} else if ((new_oid && !is_null_oid(new_oid)) ?
    @@ refs.c: struct ref_update *ref_transaction_add_update(
      		 !refname_is_safe(refname)) {
     -		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
     -			    refname);
    -+		const char *what = reflog ? "reflog with bad name" : "ref with bad name";
    ++		const char *what = flags & REF_LOG_ONLY ? "reflog with bad name" : "ref with bad name";
     +		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
      		return 0;
      	}
      
    -@@ refs.c: int ref_transaction_update(struct ref_transaction *transaction,
    - 		return -1;
    - 	}
    - 
    --	if (!transaction_refname_valid(refname, new_oid, flags, err))
    -+	if (!transaction_refname_valid(refname, new_oid, flags, 0, err))
    - 		return -1;
    - 
    - 	if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
     @@ refs.c: int ref_transaction_update(struct ref_transaction *transaction,
      	ref_transaction_add_update(transaction, refname, flags,
      				   new_oid, old_oid, new_target,
    @@ refs.c: int ref_transaction_update(struct ref_transaction *transaction,
     +
     +	assert(err);
     +
    -+	if (!transaction_refname_valid(refname, new_oid, flags, 1, err))
    -+		return -1;
    -+
     +	flags |= REF_LOG_ONLY | REF_NO_DEREF;
     +
    ++	if (!transaction_refname_valid(refname, new_oid, flags, err))
    ++		return -1;
    ++
     +	update = ref_transaction_add_update(transaction, refname, flags,
     +					    new_oid, old_oid, NULL, NULL,
     +					    committer_info, msg);
    @@ refs.h: int ref_transaction_update(struct ref_transaction *transaction,
      
     +/*
     + * Similar to`ref_transaction_update`, but this function is only for adding
    -+ * a reflog update. Supports providing custom committer information.
    ++ * a reflog update. Supports providing custom committer information. The index
    ++ * field can be utiltized to order updates as desired. When not used, the
    ++ * updates default to being ordered by refname.
     + */
     +int ref_transaction_update_reflog(struct ref_transaction *transaction,
     +				  const char *refname,
7:  dffc14e1a3 ! 7:  4d76cf4773 refs: allow multiple reflog entries for the same refname
    @@ Commit message
         update_index before writing to the block. When there are multiple
         reflogs for a given refname, it is essential that the order of the
         reflogs is maintained. So add the `index` value to the `update_index`.
    -    The `index` field is only be set when multiple reflog entries for a
    -    given refname are added and as such in most scenarios the old behavior
    +    The `index` field is only set when multiple reflog entries for a given
    +    refname are added and as such in most scenarios the old behavior
         remains.
     
         This is required to add reflog migration support to `git refs migrate`.
8:  481d185e6e ! 8:  31cc392d8d refs: add support for migrating reflogs
    @@ refs.c
      
      /*
       * List of all available backends
    -@@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update *update,
    - }
    - 
    - struct migration_data {
    -+	unsigned int reflog_index;
    +@@ refs.c: struct migration_data {
      	struct ref_store *old_refs;
      	struct ref_transaction *transaction;
      	struct strbuf *errbuf;
    ++	struct strbuf sb;
    + };
    + 
    + static int migrate_one_ref(const char *refname, const char *referent UNUSED, const struct object_id *oid,
     @@ refs.c: static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
      	return ret;
      }
      
     +struct reflog_migration_data {
    -+	unsigned int *index;
    ++	unsigned int index;
     +	const char *refname;
     +	struct ref_store *old_refs;
     +	struct ref_transaction *transaction;
     +	struct strbuf *errbuf;
    ++	struct strbuf *sb;
     +};
     +
     +static int migrate_one_reflog_entry(struct object_id *old_oid,
    @@ refs.c: static int migrate_one_ref(const char *refname, const char *referent UNU
     +				    const char *msg, void *cb_data)
     +{
     +	struct reflog_migration_data *data = cb_data;
    -+	struct strbuf sb = STRBUF_INIT;
     +	const char *date;
     +	int ret;
     +
     +	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
    ++	strbuf_reset(data->sb);
     +	/* committer contains name and email */
    -+	strbuf_addstr(&sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
    ++	strbuf_addstr(data->sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
     +
     +	ret = ref_transaction_update_reflog(data->transaction, data->refname,
    -+					    new_oid, old_oid, sb.buf,
    ++					    new_oid, old_oid, data->sb->buf,
     +					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
    -+					    (*data->index)++, data->errbuf);
    -+	strbuf_release(&sb);
    -+
    ++					    data->index++, data->errbuf);
     +	return ret;
     +}
     +
    @@ refs.c: static int migrate_one_ref(const char *refname, const char *referent UNU
     +	data.old_refs = migration_data->old_refs;
     +	data.transaction = migration_data->transaction;
     +	data.errbuf = migration_data->errbuf;
    -+	data.index = &migration_data->reflog_index;
    ++	data.sb = &migration_data->sb;
     +
     +	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
     +					migrate_one_reflog_entry, &data);
    @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
      	 */
      	strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
      	if (!mkdtemp(new_gitdir.buf)) {
    +@@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
    + 	data.old_refs = old_refs;
    + 	data.transaction = transaction;
    + 	data.errbuf = errbuf;
    ++	strbuf_init(&data.sb, 0);
    + 
    + 	/*
    + 	 * We need to use the internal `do_for_each_ref()` here so that we can
     @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
      	if (ret < 0)
      		goto done;
      
    -+	data.reflog_index = 1;
     +	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
     +	if (ret < 0)
     +		goto done;
    @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
      	ret = ref_transaction_commit(transaction, errbuf);
      	if (ret < 0)
      		goto done;
    +@@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
    + 	}
    + 	ref_transaction_free(transaction);
    + 	strbuf_release(&new_gitdir);
    ++	strbuf_release(&data.sb);
    + 	return ret;
    + }
    + 
     
      ## t/t1460-refs-migrate.sh ##
     @@ t/t1460-refs-migrate.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME


--- 

base-commit: 09245f4b75863f4e94dac7feebaafce53a26965f
change-id: 20241111-320-git-refs-migrate-reflogs-a53e3a6cffc9

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v3 1/8] refs: include committer info in `ref_update` struct
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reference backends obtain the committer information from
`git_committer_info(0)` when adding a reflog. The upcoming patches
introduce support for migrating reflogs between the reference backends.
This requires an interface to creating reflogs, including custom
committer information.

Add a new field `committer_info` to the `ref_update` struct, which is
then used by the reference backends. If there is no `committer_info`
provided, the reference backends default to using
`git_committer_info(0)`. The field itself cannot be set to
`git_committer_info(0)` since the values are dynamic and must be
obtained right when the reflog is being committed.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  1 +
 refs/files-backend.c    | 24 ++++++++++++++----------
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 12 +++++++++++-
 4 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/refs.c b/refs.c
index 762f3e324d59c60cd4f05c2f257e54de8deb00e5..f003e51c6bf5229bfbce8ce61ffad7cdba0572e0 100644
--- a/refs.c
+++ b/refs.c
@@ -1151,6 +1151,7 @@ void ref_transaction_free(struct ref_transaction *transaction)
 
 	for (i = 0; i < transaction->nr; i++) {
 		free(transaction->updates[i]->msg);
+		free(transaction->updates[i]->committer_info);
 		free((char *)transaction->updates[i]->new_target);
 		free((char *)transaction->updates[i]->old_target);
 		free(transaction->updates[i]);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 64f51f0da905a9a8a1ac4109c6b0a9a85a355db7..6078668c99ee254e794e3ba49689aa34e6022efd 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1858,6 +1858,9 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 	struct strbuf sb = STRBUF_INIT;
 	int ret = 0;
 
+	if (!committer)
+		committer = git_committer_info(0);
+
 	strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
 	if (msg && *msg) {
 		strbuf_addch(&sb, '\t');
@@ -1871,8 +1874,10 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 }
 
 static int files_log_ref_write(struct files_ref_store *refs,
-			       const char *refname, const struct object_id *old_oid,
-			       const struct object_id *new_oid, const char *msg,
+			       const char *refname,
+			       const struct object_id *old_oid,
+			       const struct object_id *new_oid,
+			       const char *committer_info, const char *msg,
 			       int flags, struct strbuf *err)
 {
 	int logfd, result;
@@ -1889,8 +1894,7 @@ static int files_log_ref_write(struct files_ref_store *refs,
 
 	if (logfd < 0)
 		return 0;
-	result = log_ref_write_fd(logfd, old_oid, new_oid,
-				  git_committer_info(0), msg);
+	result = log_ref_write_fd(logfd, old_oid, new_oid, committer_info, msg);
 	if (result) {
 		struct strbuf sb = STRBUF_INIT;
 		int save_errno = errno;
@@ -1974,8 +1978,7 @@ static int commit_ref_update(struct files_ref_store *refs,
 	files_assert_main_repository(refs, "commit_ref_update");
 
 	clear_loose_ref_cache(refs);
-	if (files_log_ref_write(refs, lock->ref_name,
-				&lock->old_oid, oid,
+	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid, oid, NULL,
 				logmsg, flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 		strbuf_addf(err, "cannot update the ref '%s': %s",
@@ -2007,9 +2010,9 @@ static int commit_ref_update(struct files_ref_store *refs,
 		if (head_ref && (head_flag & REF_ISSYMREF) &&
 		    !strcmp(head_ref, lock->ref_name)) {
 			struct strbuf log_err = STRBUF_INIT;
-			if (files_log_ref_write(refs, "HEAD",
-						&lock->old_oid, oid,
-						logmsg, flags, &log_err)) {
+			if (files_log_ref_write(refs, "HEAD", &lock->old_oid,
+						oid, NULL, logmsg, flags,
+						&log_err)) {
 				error("%s", log_err.buf);
 				strbuf_release(&log_err);
 			}
@@ -2969,7 +2972,8 @@ static int parse_and_write_reflog(struct files_ref_store *refs,
 	}
 
 	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid,
-				&update->new_oid, update->msg, update->flags, err)) {
+				&update->new_oid, update->committer_info,
+				update->msg, update->flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 
 		strbuf_addf(err, "cannot update the ref '%s': %s",
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 58aa56d1b27c85d606ed7c8c0d908e4b87d1066b..0fd95cdacd99e4a728c22f5286f6b3f0f360c110 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -113,6 +113,7 @@ struct ref_update {
 	void *backend_data;
 	unsigned int type;
 	char *msg;
+	char *committer_info;
 
 	/*
 	 * If this ref_update was split off of a symref update via
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 647ef9b05b1dc9a376ed054330b487f7595c5caa..e882602487c66261d586a94101bb1b4e9a2ed60e 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1379,11 +1379,21 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			}
 
 			if (create_reflog) {
+				struct ident_split c;
+
 				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
 				log = &logs[logs_nr++];
 				memset(log, 0, sizeof(*log));
 
-				fill_reftable_log_record(log, &committer_ident);
+				if (u->committer_info) {
+					if (split_ident_line(&c, u->committer_info,
+							     strlen(u->committer_info)))
+						BUG("failed splitting committer info");
+				} else {
+					c = committer_ident;
+				}
+
+				fill_reftable_log_record(log, &c);
 				log->update_index = ts;
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 2/8] refs: add `index` field to `struct ref_udpate`
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 3/8] refs/files: add count field to ref_lock Karthik Nayak
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reftable backend, sorts its updates by refname before applying them,
this ensures that the references are stored sorted. When migrating
reflogs from one backend to another, the order of the reflogs must be
maintained. Add a new `index` field to the `ref_update` struct to
facilitate this.

This field is used in the reftable backend's sort comparison function
`transaction_update_cmp`, to ensure that indexed fields maintain their
order.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/refs-internal.h    |  7 +++++++
 refs/reftable-backend.c | 13 +++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 0fd95cdacd99e4a728c22f5286f6b3f0f360c110..f5c733d099f0c6f1076a25f4f77d9d5eb345ec87 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -115,6 +115,13 @@ struct ref_update {
 	char *msg;
 	char *committer_info;
 
+	/*
+	 * The index overrides the default sort algorithm. This is needed
+	 * when migrating reflogs and we want to ensure we carry over the
+	 * same order.
+	 */
+	unsigned int index;
+
 	/*
 	 * If this ref_update was split off of a symref update via
 	 * split_symref_update(), then this member points at that
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index e882602487c66261d586a94101bb1b4e9a2ed60e..c008f20be719fec3af6a8f81c821cb9c263764d7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1279,8 +1279,17 @@ static int reftable_be_transaction_abort(struct ref_store *ref_store UNUSED,
 
 static int transaction_update_cmp(const void *a, const void *b)
 {
-	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
-		      ((struct reftable_transaction_update *)b)->update->refname);
+	struct reftable_transaction_update *update_a = (struct reftable_transaction_update *)a;
+	struct reftable_transaction_update *update_b = (struct reftable_transaction_update *)b;
+
+	/*
+	 * If there is an index set, it should take preference (default is 0).
+	 * This ensures that updates with indexes are sorted amongst themselves.
+	 */
+	if (update_a->update->index || update_b->update->index)
+		return update_a->update->index - update_b->update->index;
+
+	return strcmp(update_a->update->refname, update_b->update->refname);
 }
 
 static int write_transaction_table(struct reftable_writer *writer, void *cb_data)

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 3/8] refs/files: add count field to ref_lock
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 4/8] refs: extract out refname verification in transactions Karthik Nayak
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

When refs are updated in the files-backend, a lock is obtained for the
corresponding file path. This is the case even for reflogs, i.e. a lock
is obtained on the reference path instead of the reflog path. This
works, since generally, reflogs are updated alongside the ref.

The upcoming patches will add support for reflog updates in ref
transaction. This means, in a particular transaction we want to have ref
updates and reflog updates. For a given ref in a given transaction there
can be at most one update. But we can theoretically have multiple reflog
updates for a given ref in a given transaction. A great example of this
would be when migrating reflogs from one backend to another. There we
would batch all the reflog updates for a given reference in a single
transaction.

The current flow does not support this, because currently refs & reflogs
are treated as a single entity and capture the lock together. To
separate this, add a count field to ref_lock. With this, multiple
updates can hold onto a single ref_lock and the lock will only be
released when all of them release the lock.

This patch only adds the `count` field to `ref_lock` and adds the logic
to increment and decrement the lock. In a follow up commit, we'll
separate the reflog update logic from ref updates and utilize this
functionality.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c | 58 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6078668c99ee254e794e3ba49689aa34e6022efd..02cb4907d8659e87a227fed4f60a5f6606be8764 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -71,6 +71,7 @@ struct ref_lock {
 	char *ref_name;
 	struct lock_file lk;
 	struct object_id old_oid;
+	unsigned int count; /* track users of the lock (ref update + reflog updates) */
 };
 
 struct files_ref_store {
@@ -638,9 +639,12 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 
 static void unlock_ref(struct ref_lock *lock)
 {
-	rollback_lock_file(&lock->lk);
-	free(lock->ref_name);
-	free(lock);
+	lock->count--;
+	if (!lock->count) {
+		rollback_lock_file(&lock->lk);
+		free(lock->ref_name);
+		free(lock);
+	}
 }
 
 /*
@@ -696,6 +700,7 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	*lock_p = CALLOC_ARRAY(lock, 1);
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 	files_ref_path(refs, &ref_file, refname);
 
 retry:
@@ -1169,6 +1174,7 @@ static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
 		goto error_return;
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 
 	if (raceproof_create_file(ref_file.buf, create_reflock, &lock->lk)) {
 		unable_to_lock_message(ref_file.buf, errno, err);
@@ -2535,6 +2541,12 @@ static int check_old_oid(struct ref_update *update, struct object_id *oid,
 	return -1;
 }
 
+struct files_transaction_backend_data {
+	struct ref_transaction *packed_transaction;
+	int packed_refs_locked;
+	struct strmap ref_locks;
+};
+
 /*
  * Prepare for carrying out update:
  * - Lock the reference referred to by update.
@@ -2557,11 +2569,14 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 {
 	struct strbuf referent = STRBUF_INIT;
 	int mustexist = ref_update_expects_existing_old_ref(update);
+	struct files_transaction_backend_data *backend_data;
 	int ret = 0;
 	struct ref_lock *lock;
 
 	files_assert_main_repository(refs, "lock_ref_for_update");
 
+	backend_data = transaction->backend_data;
+
 	if ((update->flags & REF_HAVE_NEW) && ref_update_has_null_new_value(update))
 		update->flags |= REF_DELETING;
 
@@ -2572,18 +2587,25 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			goto out;
 	}
 
-	ret = lock_raw_ref(refs, update->refname, mustexist,
-			   affected_refnames,
-			   &lock, &referent,
-			   &update->type, err);
-	if (ret) {
-		char *reason;
+	lock = strmap_get(&backend_data->ref_locks, update->refname);
+	if (lock) {
+		lock->count++;
+	} else {
+		ret = lock_raw_ref(refs, update->refname, mustexist,
+				   affected_refnames,
+				   &lock, &referent,
+				   &update->type, err);
+		if (ret) {
+			char *reason;
+
+			reason = strbuf_detach(err, NULL);
+			strbuf_addf(err, "cannot lock ref '%s': %s",
+				    ref_update_original_update_refname(update), reason);
+			free(reason);
+			goto out;
+		}
 
-		reason = strbuf_detach(err, NULL);
-		strbuf_addf(err, "cannot lock ref '%s': %s",
-			    ref_update_original_update_refname(update), reason);
-		free(reason);
-		goto out;
+		strmap_put(&backend_data->ref_locks, update->refname, lock);
 	}
 
 	update->backend_data = lock;
@@ -2730,11 +2752,6 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	return ret;
 }
 
-struct files_transaction_backend_data {
-	struct ref_transaction *packed_transaction;
-	int packed_refs_locked;
-};
-
 /*
  * Unlock any references in `transaction` that are still locked, and
  * mark the transaction closed.
@@ -2767,6 +2784,8 @@ static void files_transaction_cleanup(struct files_ref_store *refs,
 		if (backend_data->packed_refs_locked)
 			packed_refs_unlock(refs->packed_ref_store);
 
+		strmap_clear(&backend_data->ref_locks, 0);
+
 		free(backend_data);
 	}
 
@@ -2796,6 +2815,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		goto cleanup;
 
 	CALLOC_ARRAY(backend_data, 1);
+	strmap_init(&backend_data->ref_locks);
 	transaction->backend_data = backend_data;
 
 	/*

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 4/8] refs: extract out refname verification in transactions
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
                       ` (2 preceding siblings ...)
  2024-12-15 16:25     ` [PATCH v3 3/8] refs/files: add count field to ref_lock Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
the refname of the update is verified for:

  - Ensuring it is not a pseudoref.
  - Checking the refname format.

These checks will also be needed in a following commit where the
function to add reflog updates to the transaction is introduced. Extract
the code out into a new static function.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801 100644
--- a/refs.c
+++ b/refs.c
@@ -1196,6 +1196,28 @@ struct ref_update *ref_transaction_add_update(
 	return update;
 }
 
+static int transaction_refname_valid(const char *refname,
+				     const struct object_id *new_oid,
+				     unsigned int flags, struct strbuf *err)
+{
+	if (flags & REF_SKIP_REFNAME_VERIFICATION)
+		return 1;
+
+	if (is_pseudo_ref(refname)) {
+		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
+			    refname);
+		return 0;
+	} else if ((new_oid && !is_null_oid(new_oid)) ?
+		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
+		 !refname_is_safe(refname)) {
+		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
+			    refname);
+		return 0;
+	}
+
+	return 1;
+}
+
 int ref_transaction_update(struct ref_transaction *transaction,
 			   const char *refname,
 			   const struct object_id *new_oid,
@@ -1213,21 +1235,8 @@ int ref_transaction_update(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    ((new_oid && !is_null_oid(new_oid)) ?
-		     check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
-			   !refname_is_safe(refname))) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
+	if (!transaction_refname_valid(refname, new_oid, flags, err))
 		return -1;
-	}
-
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
-		return -1;
-	}
 
 	if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
 		BUG("illegal flags 0x%x passed to ref_transaction_update()", flags);

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
                       ` (3 preceding siblings ...)
  2024-12-15 16:25     ` [PATCH v3 4/8] refs: extract out refname verification in transactions Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `ref_transaction_add_update()` creates the `ref_update` struct. To
facilitate addition of reflogs in the next commit, the function needs to
accommodate setting the `committer_info` field in the struct. So modify
the function to also take `committer_info` as an argument and set it
accordingly.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  7 +++++--
 refs/files-backend.c    | 14 ++++++++------
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c |  6 ++++--
 4 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801..782bf1090af65196263a3c35ed18d878bb4f2967 100644
--- a/refs.c
+++ b/refs.c
@@ -1166,6 +1166,7 @@ struct ref_update *ref_transaction_add_update(
 		const struct object_id *new_oid,
 		const struct object_id *old_oid,
 		const char *new_target, const char *old_target,
+		const char *committer_info,
 		const char *msg)
 {
 	struct ref_update *update;
@@ -1190,8 +1191,10 @@ struct ref_update *ref_transaction_add_update(
 		oidcpy(&update->new_oid, new_oid);
 	if ((flags & REF_HAVE_OLD) && old_oid)
 		oidcpy(&update->old_oid, old_oid);
-	if (!(flags & REF_SKIP_CREATE_REFLOG))
+	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
+		update->committer_info = xstrdup_or_null(committer_info);
 		update->msg = normalize_reflog_message(msg);
+	}
 
 	return update;
 }
@@ -1253,7 +1256,7 @@ int ref_transaction_update(struct ref_transaction *transaction,
 
 	ref_transaction_add_update(transaction, refname, flags,
 				   new_oid, old_oid, new_target,
-				   old_target, msg);
+				   old_target, NULL, msg);
 	return 0;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 02cb4907d8659e87a227fed4f60a5f6606be8764..255fed8354cae982f785b1b85340e2a1eeecf2a6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1270,7 +1270,7 @@ static void prune_ref(struct files_ref_store *refs, struct ref_to_prune *r)
 	ref_transaction_add_update(
 			transaction, r->name,
 			REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | REF_IS_PRUNING,
-			null_oid(), &r->oid, NULL, NULL, NULL);
+			null_oid(), &r->oid, NULL, NULL, NULL, NULL);
 	if (ref_transaction_commit(transaction, &err))
 		goto cleanup;
 
@@ -2417,7 +2417,7 @@ static int split_head_update(struct ref_update *update,
 			transaction, "HEAD",
 			update->flags | REF_LOG_ONLY | REF_NO_DEREF,
 			&update->new_oid, &update->old_oid,
-			NULL, NULL, update->msg);
+			NULL, NULL, update->committer_info, update->msg);
 
 	/*
 	 * Add "HEAD". This insertion is O(N) in the transaction
@@ -2481,7 +2481,8 @@ static int split_symref_update(struct ref_update *update,
 			transaction, referent, new_flags,
 			update->new_target ? NULL : &update->new_oid,
 			update->old_target ? NULL : &update->old_oid,
-			update->new_target, update->old_target, update->msg);
+			update->new_target, update->old_target, NULL,
+			update->msg);
 
 	new_update->parent_update = update;
 
@@ -2914,7 +2915,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 					packed_transaction, update->refname,
 					REF_HAVE_NEW | REF_NO_DEREF,
 					&update->new_oid, NULL,
-					NULL, NULL, NULL);
+					NULL, NULL, NULL, NULL);
 		}
 	}
 
@@ -3094,12 +3095,13 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 			ref_transaction_add_update(loose_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,
 						   update->new_target ? NULL : &update->new_oid, NULL,
-						   update->new_target, NULL, NULL);
+						   update->new_target, NULL, update->committer_info,
+						   NULL);
 		} else {
 			ref_transaction_add_update(packed_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,
 						   &update->new_oid, &update->old_oid,
-						   NULL, NULL, NULL);
+						   NULL, NULL, update->committer_info, NULL);
 		}
 	}
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f5c733d099f0c6f1076a25f4f77d9d5eb345ec87..79b287c5ec5c7d8f759869cf93cda405640186dc 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -162,6 +162,7 @@ struct ref_update *ref_transaction_add_update(
 		const struct object_id *new_oid,
 		const struct object_id *old_oid,
 		const char *new_target, const char *old_target,
+		const char *committer_info,
 		const char *msg);
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c008f20be719fec3af6a8f81c821cb9c263764d7..b2e3ba877de9e59fea5a4d066eb13e60ef22a32b 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1078,7 +1078,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			new_update = ref_transaction_add_update(
 					transaction, "HEAD",
 					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
-					&u->new_oid, &u->old_oid, NULL, NULL, u->msg);
+					&u->new_oid, &u->old_oid, NULL, NULL, NULL,
+					u->msg);
 			string_list_insert(&affected_refnames, new_update->refname);
 		}
 
@@ -1161,7 +1162,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 					transaction, referent.buf, new_flags,
 					u->new_target ? NULL : &u->new_oid,
 					u->old_target ? NULL : &u->old_oid,
-					u->new_target, u->old_target, u->msg);
+					u->new_target, u->old_target,
+					u->committer_info, u->msg);
 
 				new_update->parent_update = u;
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
                       ` (4 preceding siblings ...)
  2024-12-15 16:25     ` [PATCH v3 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

Introduce a new function `ref_transaction_update_reflog`, for clients to
add a reflog update to a transaction. While the existing function
`ref_transaction_update` also allows clients to add a reflog entry, this
function does a few things more, It:
  - Enforces that only a reflog entry is added and does not update the
  ref itself.
  - Allows the users to also provide the committer information. This
  means clients can add reflog entries with custom committer
  information.

The `transaction_refname_valid()` function also modifies the error
message selectively based on the type of the update. This change also
affects reflog updates which go through `ref_transaction_update()`.

A follow up commit will utilize this function to add reflog support to
`git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c               | 39 +++++++++++++++++++++++++++++++++++----
 refs.h               | 14 ++++++++++++++
 refs/files-backend.c | 24 ++++++++++++++++--------
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/refs.c b/refs.c
index 782bf1090af65196263a3c35ed18d878bb4f2967..8b3882cff17e5e3b0376f75654e32f81a23e5cb2 100644
--- a/refs.c
+++ b/refs.c
@@ -1207,14 +1207,14 @@ static int transaction_refname_valid(const char *refname,
 		return 1;
 
 	if (is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
+		const char *what = flags & REF_LOG_ONLY ? "reflog for pseudoref" : "pseudoref";
+		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
 		return 0;
 	} else if ((new_oid && !is_null_oid(new_oid)) ?
 		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
 		 !refname_is_safe(refname)) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
+		const char *what = flags & REF_LOG_ONLY ? "reflog with bad name" : "ref with bad name";
+		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
 		return 0;
 	}
 
@@ -1257,6 +1257,37 @@ int ref_transaction_update(struct ref_transaction *transaction,
 	ref_transaction_add_update(transaction, refname, flags,
 				   new_oid, old_oid, new_target,
 				   old_target, NULL, msg);
+
+	return 0;
+}
+
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err)
+{
+	struct ref_update *update;
+
+	assert(err);
+
+	flags |= REF_LOG_ONLY | REF_NO_DEREF;
+
+	if (!transaction_refname_valid(refname, new_oid, flags, err))
+		return -1;
+
+	update = ref_transaction_add_update(transaction, refname, flags,
+					    new_oid, old_oid, NULL, NULL,
+					    committer_info, msg);
+	/*
+	 * While we do set the old_oid value, we unset the flag to skip
+	 * old_oid verification which only makes sense for refs.
+	 */
+	update->flags &= ~REF_HAVE_OLD;
+	update->index = index;
+
 	return 0;
 }
 
diff --git a/refs.h b/refs.h
index a5bedf48cf6de91005a7e8d0bf58ca98350397a6..b0dfc65ed2e59c4b66967840339f81e7746a96d3 100644
--- a/refs.h
+++ b/refs.h
@@ -727,6 +727,20 @@ int ref_transaction_update(struct ref_transaction *transaction,
 			   unsigned int flags, const char *msg,
 			   struct strbuf *err);
 
+/*
+ * Similar to`ref_transaction_update`, but this function is only for adding
+ * a reflog update. Supports providing custom committer information. The index
+ * field can be utiltized to order updates as desired. When not used, the
+ * updates default to being ordered by refname.
+ */
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err);
+
 /*
  * Add a reference creation to transaction. new_oid is the value that
  * the reference should have after the update; it must not be
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 255fed8354cae982f785b1b85340e2a1eeecf2a6..c11213f52065bcf2fa7612df8f9500692ee2d02c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3080,10 +3080,12 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 
 		/*
-		 * packed-refs don't support symbolic refs and root refs, so we
-		 * have to queue these references via the loose transaction.
+		 * packed-refs don't support symbolic refs, root refs and reflogs,
+		 * so we have to queue these references via the loose transaction.
 		 */
-		if (update->new_target || is_root_ref(update->refname)) {
+		if (update->new_target ||
+		    is_root_ref(update->refname) ||
+		    (update->flags & REF_LOG_ONLY)) {
 			if (!loose_transaction) {
 				loose_transaction = ref_store_transaction_begin(&refs->base, 0, err);
 				if (!loose_transaction) {
@@ -3092,11 +3094,17 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 				}
 			}
 
-			ref_transaction_add_update(loose_transaction, update->refname,
-						   update->flags & ~REF_HAVE_OLD,
-						   update->new_target ? NULL : &update->new_oid, NULL,
-						   update->new_target, NULL, update->committer_info,
-						   NULL);
+			if (update->flags & REF_LOG_ONLY)
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags, &update->new_oid,
+							   &update->old_oid, NULL, NULL,
+							   update->committer_info, update->msg);
+			else
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags & ~REF_HAVE_OLD,
+							   update->new_target ? NULL : &update->new_oid, NULL,
+							   update->new_target, NULL, update->committer_info,
+							   NULL);
 		} else {
 			ref_transaction_add_update(packed_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 7/8] refs: allow multiple reflog entries for the same refname
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
                       ` (5 preceding siblings ...)
  2024-12-15 16:25     ` [PATCH v3 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-15 16:25     ` [PATCH v3 8/8] refs: add support for migrating reflogs Karthik Nayak
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reference transaction only allows a single update for a given
reference to avoid conflicts. This, however, isn't an issue for reflogs.
There are no conflicts to be resolved in reflogs and when migrating
reflogs between backends we'd have multiple reflog entries for the same
refname.

So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.

In the reftable backend, the writer sorts all updates based on the
update_index before writing to the block. When there are multiple
reflogs for a given refname, it is essential that the order of the
reflogs is maintained. So add the `index` value to the `update_index`.
The `index` field is only set when multiple reflog entries for a given
refname are added and as such in most scenarios the old behavior
remains.

This is required to add reflog migration support to `git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c    | 15 +++++++++++----
 refs/reftable-backend.c | 22 +++++++++++++++++++---
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
+	if (update->flags & REF_LOG_ONLY)
+		goto out;
+
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
@@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	 */
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *update = transaction->updates[i];
-		struct string_list_item *item =
-			string_list_append(&affected_refnames, update->refname);
+		struct string_list_item *item;
 
 		if ((update->flags & REF_IS_PRUNING) &&
 		    !(update->flags & REF_NO_DEREF))
 			BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
+		if (update->flags & REF_LOG_ONLY)
+			continue;
+
+		item = string_list_append(&affected_refnames, update->refname);
 		/*
 		 * We store a pointer to update in item->util, but at
 		 * the moment we never use the value of this field
@@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 
 	/* Fail if a refname appears more than once in the transaction: */
 	for (i = 0; i < transaction->nr; i++)
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	string_list_sort(&affected_refnames);
 	if (ref_update_reject_duplicates(&affected_refnames, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		if (ret)
 			goto done;
 
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	}
 
 	/*
@@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
+	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				}
 
 				fill_reftable_log_record(log, &c);
-				log->update_index = ts;
+
+				/*
+				 * Updates are sorted by the writer. So updates for the same
+				 * refname need to contain different update indices.
+				 */
+				log->update_index = ts + u->index;
+
+				/*
+				 * Note the max update_index so the limit can be set later on.
+				 */
+				if (log->update_index > max_update_index)
+					max_update_index = log->update_index;
+
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
+		reftable_writer_set_limits(writer, ts, max_update_index);
+
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v3 8/8] refs: add support for migrating reflogs
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
                       ` (6 preceding siblings ...)
  2024-12-15 16:25     ` [PATCH v3 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
@ 2024-12-15 16:25     ` Karthik Nayak
  2024-12-16  7:25       ` Patrick Steinhardt
  2024-12-15 23:54     ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Junio C Hamano
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
  9 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-15 16:25 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.

To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-refs.txt |  2 --
 refs.c                     | 89 ++++++++++++++++++++++++++++++++--------------
 t/t1460-refs-migrate.sh    | 73 +++++++++++++++++++++++++------------
 3 files changed, 113 insertions(+), 51 deletions(-)

diff --git a/Documentation/git-refs.txt b/Documentation/git-refs.txt
index ce31f93061db5e5d16aca516dd3d15f6527db870..9829984b0a4c4f54ec7f9b6c6c7072f62b1d198d 100644
--- a/Documentation/git-refs.txt
+++ b/Documentation/git-refs.txt
@@ -57,8 +57,6 @@ KNOWN LIMITATIONS
 
 The ref format migration has several known limitations in its current form:
 
-* It is not possible to migrate repositories that have reflogs.
-
 * It is not possible to migrate repositories that have worktrees.
 
 * There is no way to block concurrent writes to the repository during an
diff --git a/refs.c b/refs.c
index 8b3882cff17e5e3b0376f75654e32f81a23e5cb2..4a74f7c7bd0314ad8e6c4cbea436df934b2c7f88 100644
--- a/refs.c
+++ b/refs.c
@@ -30,6 +30,7 @@
 #include "date.h"
 #include "commit.h"
 #include "wildmatch.h"
+#include "ident.h"
 
 /*
  * List of all available backends
@@ -2673,6 +2674,7 @@ struct migration_data {
 	struct ref_store *old_refs;
 	struct ref_transaction *transaction;
 	struct strbuf *errbuf;
+	struct strbuf sb;
 };
 
 static int migrate_one_ref(const char *refname, const char *referent UNUSED, const struct object_id *oid,
@@ -2705,6 +2707,52 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
 	return ret;
 }
 
+struct reflog_migration_data {
+	unsigned int index;
+	const char *refname;
+	struct ref_store *old_refs;
+	struct ref_transaction *transaction;
+	struct strbuf *errbuf;
+	struct strbuf *sb;
+};
+
+static int migrate_one_reflog_entry(struct object_id *old_oid,
+				    struct object_id *new_oid,
+				    const char *committer,
+				    timestamp_t timestamp, int tz,
+				    const char *msg, void *cb_data)
+{
+	struct reflog_migration_data *data = cb_data;
+	const char *date;
+	int ret;
+
+	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
+	strbuf_reset(data->sb);
+	/* committer contains name and email */
+	strbuf_addstr(data->sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
+
+	ret = ref_transaction_update_reflog(data->transaction, data->refname,
+					    new_oid, old_oid, data->sb->buf,
+					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
+					    data->index++, data->errbuf);
+	return ret;
+}
+
+static int migrate_one_reflog(const char *refname, void *cb_data)
+{
+	struct migration_data *migration_data = cb_data;
+	struct reflog_migration_data data;
+
+	data.refname = refname;
+	data.old_refs = migration_data->old_refs;
+	data.transaction = migration_data->transaction;
+	data.errbuf = migration_data->errbuf;
+	data.sb = &migration_data->sb;
+
+	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
+					migrate_one_reflog_entry, &data);
+}
+
 static int move_files(const char *from_path, const char *to_path, struct strbuf *errbuf)
 {
 	struct strbuf from_buf = STRBUF_INIT, to_buf = STRBUF_INIT;
@@ -2771,13 +2819,6 @@ static int move_files(const char *from_path, const char *to_path, struct strbuf
 	return ret;
 }
 
-static int count_reflogs(const char *reflog UNUSED, void *payload)
-{
-	size_t *reflog_count = payload;
-	(*reflog_count)++;
-	return 0;
-}
-
 static int has_worktrees(void)
 {
 	struct worktree **worktrees = get_worktrees();
@@ -2803,7 +2844,6 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	struct ref_transaction *transaction = NULL;
 	struct strbuf new_gitdir = STRBUF_INIT;
 	struct migration_data data;
-	size_t reflog_count = 0;
 	int did_migrate_refs = 0;
 	int ret;
 
@@ -2815,21 +2855,6 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 
 	old_refs = get_main_ref_store(repo);
 
-	/*
-	 * We do not have any interfaces that would allow us to write many
-	 * reflog entries. Once we have them we can remove this restriction.
-	 */
-	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
-		strbuf_addstr(errbuf, "cannot count reflogs");
-		ret = -1;
-		goto done;
-	}
-	if (reflog_count) {
-		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
-		ret = -1;
-		goto done;
-	}
-
 	/*
 	 * Worktrees complicate the migration because every worktree has a
 	 * separate ref storage. While it should be feasible to implement, this
@@ -2855,17 +2880,21 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	 *      This operation is safe as we do not yet modify the main
 	 *      repository.
 	 *
-	 *   3. If we're in dry-run mode then we are done and can hand over the
+	 *   3. Enumerate all reflogs and write them into the new ref storage.
+	 *      This operation is safe as we do not yet modify the main
+	 *      repository.
+	 *
+	 *   4. If we're in dry-run mode then we are done and can hand over the
 	 *      directory to the caller for inspection. If not, we now start
 	 *      with the destructive part.
 	 *
-	 *   4. Delete the old ref storage from disk. As we have a copy of refs
+	 *   5. Delete the old ref storage from disk. As we have a copy of refs
 	 *      in the new ref storage it's okay(ish) if we now get interrupted
 	 *      as there is an equivalent copy of all refs available.
 	 *
-	 *   5. Move the new ref storage files into place.
+	 *   6. Move the new ref storage files into place.
 	 *
-	 *   6. Change the repository format to the new ref format.
+	 *  7. Change the repository format to the new ref format.
 	 */
 	strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
 	if (!mkdtemp(new_gitdir.buf)) {
@@ -2889,6 +2918,7 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	data.old_refs = old_refs;
 	data.transaction = transaction;
 	data.errbuf = errbuf;
+	strbuf_init(&data.sb, 0);
 
 	/*
 	 * We need to use the internal `do_for_each_ref()` here so that we can
@@ -2907,6 +2937,10 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	if (ret < 0)
 		goto done;
 
+	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
+	if (ret < 0)
+		goto done;
+
 	ret = ref_transaction_commit(transaction, errbuf);
 	if (ret < 0)
 		goto done;
@@ -2982,6 +3016,7 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	}
 	ref_transaction_free(transaction);
 	strbuf_release(&new_gitdir);
+	strbuf_release(&data.sb);
 	return ret;
 }
 
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index 1bfff3a7afd5acc470424dfe7ec3e97d45f5c481..f59bc4860f19c4af82dc6f2984bdb69d61fe3ec2 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -7,23 +7,44 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
 
+# Migrate the provided repository from one format to the other and
+# verify that the references and logs are migrated over correctly.
+# Usage: test_migration <repo> <format> <skip_reflog_verify>
+#   <repo> is the relative path to the repo to be migrated.
+#   <format> is the ref format to be migrated to.
+#   <skip_reflog_verify> (true or false) whether to skip reflog verification.
 test_migration () {
-	git -C "$1" for-each-ref --include-root-refs \
+	repo=$1 &&
+	format=$2 &&
+	skip_reflog_verify=${3:-false} &&
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >expect &&
-	git -C "$1" refs migrate --ref-format="$2" &&
-	git -C "$1" for-each-ref --include-root-refs \
+	if ! $skip_reflog_verify
+	then
+	   git -C "$repo" reflog --all >expect_logs &&
+	   git -C "$repo" reflog list >expect_log_list
+	fi &&
+
+	git -C "$repo" refs migrate --ref-format="$2" &&
+
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >actual &&
 	test_cmp expect actual &&
+	if ! $skip_reflog_verify
+	then
+		git -C "$repo" reflog --all >actual_logs &&
+		git -C "$repo" reflog list >actual_log_list &&
+		test_cmp expect_logs actual_logs &&
+		test_cmp expect_log_list actual_log_list
+	fi &&
 
-	git -C "$1" rev-parse --show-ref-format >actual &&
-	echo "$2" >expect &&
+	git -C "$repo" rev-parse --show-ref-format >actual &&
+	echo "$format" >expect &&
 	test_cmp expect actual
 }
 
 test_expect_success 'setup' '
-	rm -rf .git &&
-	# The migration does not yet support reflogs.
-	git config --global core.logAllRefUpdates false
+	rm -rf .git
 '
 
 test_expect_success "superfluous arguments" '
@@ -78,19 +99,6 @@ do
 			test_cmp expect err
 		'
 
-		test_expect_success "$from_format -> $to_format: migration with reflog fails" '
-			test_when_finished "rm -rf repo" &&
-			git init --ref-format=$from_format repo &&
-			test_config -C repo core.logAllRefUpdates true &&
-			test_commit -C repo logged &&
-			test_must_fail git -C repo refs migrate \
-				--ref-format=$to_format 2>err &&
-			cat >expect <<-EOF &&
-			error: migrating reflogs is not supported yet
-			EOF
-			test_cmp expect err
-		'
-
 		test_expect_success "$from_format -> $to_format: migration with worktree fails" '
 			test_when_finished "rm -rf repo" &&
 			git init --ref-format=$from_format repo &&
@@ -141,7 +149,7 @@ do
 			test_commit -C repo initial &&
 			test-tool -C repo ref-store main update-ref "" refs/heads/broken \
 				"$(test_oid 001)" "$ZERO_OID" REF_SKIP_CREATE_REFLOG,REF_SKIP_OID_VERIFICATION &&
-			test_migration repo "$to_format" &&
+			test_migration repo "$to_format" true &&
 			test_oid 001 >expect &&
 			git -C repo rev-parse refs/heads/broken >actual &&
 			test_cmp expect actual
@@ -195,6 +203,27 @@ do
 			git -C repo rev-parse --show-ref-format >actual &&
 			test_cmp expect actual
 		'
+
+		test_expect_success "$from_format -> $to_format: reflogs of symrefs with target deleted" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit -C repo initial &&
+			git -C repo branch branch-1 HEAD &&
+			git -C repo symbolic-ref refs/heads/symref refs/heads/branch-1 &&
+			cat >input <<-EOF &&
+			delete refs/heads/branch-1
+			EOF
+			git -C repo update-ref --stdin <input &&
+			test_migration repo "$to_format"
+		'
+
+		test_expect_success "$from_format -> $to_format: reflogs order is retained" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit --date "100005000 +0700" --no-tag -C repo initial &&
+			test_commit --date "100003000 +0700" --no-tag -C repo second &&
+			test_migration repo "$to_format"
+		'
 	done
 done
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/8] refs: add reflog support to `git refs migrate`
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
                       ` (7 preceding siblings ...)
  2024-12-15 16:25     ` [PATCH v3 8/8] refs: add support for migrating reflogs Karthik Nayak
@ 2024-12-15 23:54     ` Junio C Hamano
  2024-12-16 14:33       ` karthik nayak
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
  9 siblings, 1 reply; 93+ messages in thread
From: Junio C Hamano @ 2024-12-15 23:54 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> Overall, this series is a bit more involved, and I would appreciate it
> if it receives a bit more scrutiny.
>
> The series is based on top of e66fd72e97 (The fourteenth batch,
> 2024-12-06) with `kn/reftable-writer-log-write-verify` merged in.

t1460.6 does not pass for me.  I noticed it after I merged it to
'seen', but it fails standalone as well.

  $ git log --first-parent --boundary master..kn/reflog-migration
  65a841566a refs: add support for migrating reflogs
  44ffb130f3 refs: allow multiple reflog entries for the same refname
  6134624eaf refs: introduce the `ref_transaction_update_reflog` function
  65a73fce30 refs: add `committer_info` to `ref_transaction_add_update()`
  5ed33e32c7 refs: extract out refname verification in transactions
  e9851924a2 refs/files: add count field to ref_lock
  cdfa2c379a refs: add `index` field to `struct ref_udpate`
  ee4d52c7f2 refs: include committer info in `ref_update` struct
  df5d7a7ba5 Merge branch 'kn/reftable-writer-log-write-verify' into kn/reflog-migration
  - 49c6b912e2 reftable/writer: ensure valid range for log's update_index
  - e66fd72e97 The fourteenth batch

...
ok 5 - files: migration to same format fails

expecting success of 1460.6 'files -> reftable: migration with worktree fails':
                        test_when_finished "rm -rf repo" &&
                        git init --ref-format=$from_format repo &&
                        git -C repo worktree add wt &&
                        test_must_fail git -C repo refs migrate \
                                --ref-format=$to_format 2>err &&
                        cat >expect <<-EOF &&
                        error: migrating repositories with worktrees is not supported yet
                        EOF
                        test_cmp expect err

++ test_when_finished 'rm -rf repo'
++ test 0 = 0
++ test_cleanup='{ rm -rf repo
                } && (exit "$eval_ret"); eval_ret=$?; :'
++ git init --ref-format=files repo
Initialized empty Git repository in /home/gitster/w/git.git/t/trash directory.t1460-refs-migrate/repo/.git/
++ git -C repo worktree add wt
No possible source branch, inferring '--orphan'
Preparing worktree (new branch 'wt')
++ test_must_fail git -C repo refs migrate --ref-format=reftable
++ case "$1" in
++ _test_ok=
++ test_must_fail_acceptable git -C repo refs migrate --ref-format=reftable
++ test git = env
++ test git = nongit
++ case "$1" in
++ return 0
++ git -C repo refs migrate --ref-format=reftable
/home/gitster/w/git.git/t/test-lib-functions.sh: line 1175: 3403665 Aborted                 "$@" 2>&7
++ exit_code=134
++ test 134 -eq 0
++ test_match_signal 13 134
++ test 134 = 141
++ test 134 = 269
++ return 1
++ test 134 -gt 129
++ test 134 -le 192
++ echo 'test_must_fail: died by signal 6: git -C repo refs migrate --ref-format=reftable'
test_must_fail: died by signal 6: git -C repo refs migrate --ref-format=reftable
++ return 1
error: last command exited with $?=1
not ok 6 - files -> reftable: migration with worktree fails
#
#                               test_when_finished "rm -rf repo" &&
#                               git init --ref-format=$from_format repo &&
#                               git -C repo worktree add wt &&
#                               test_must_fail git -C repo refs migrate \
#                                       --ref-format=$to_format 2>err &&
#                               cat >expect <<-EOF &&
#                               error: migrating repositories with worktrees is not supported yet
#                               EOF
#                               test_cmp expect err
#
1..6

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 8/8] refs: add support for migrating reflogs
  2024-12-15 16:25     ` [PATCH v3 8/8] refs: add support for migrating reflogs Karthik Nayak
@ 2024-12-16  7:25       ` Patrick Steinhardt
  2024-12-16 15:50         ` Junio C Hamano
  0 siblings, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-16  7:25 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Christian Couder

On Sun, Dec 15, 2024 at 05:25:45PM +0100, Karthik Nayak wrote:
> @@ -2705,6 +2707,52 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
>  	return ret;
>  }
>  
> +struct reflog_migration_data {
> +	unsigned int index;
> +	const char *refname;
> +	struct ref_store *old_refs;
> +	struct ref_transaction *transaction;
> +	struct strbuf *errbuf;
> +	struct strbuf *sb;
> +};
> +
> +static int migrate_one_reflog_entry(struct object_id *old_oid,
> +				    struct object_id *new_oid,
> +				    const char *committer,
> +				    timestamp_t timestamp, int tz,
> +				    const char *msg, void *cb_data)
> +{
> +	struct reflog_migration_data *data = cb_data;
> +	const char *date;
> +	int ret;
> +
> +	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
> +	strbuf_reset(data->sb);
> +	/* committer contains name and email */
> +	strbuf_addstr(data->sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
> +
> +	ret = ref_transaction_update_reflog(data->transaction, data->refname,
> +					    new_oid, old_oid, data->sb->buf,
> +					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
> +					    data->index++, data->errbuf);
> +	return ret;
> +}
> +
> +static int migrate_one_reflog(const char *refname, void *cb_data)
> +{
> +	struct migration_data *migration_data = cb_data;
> +	struct reflog_migration_data data;
> +
> +	data.refname = refname;
> +	data.old_refs = migration_data->old_refs;
> +	data.transaction = migration_data->transaction;
> +	data.errbuf = migration_data->errbuf;
> +	data.sb = &migration_data->sb;

The `index` variable isn't getting initialized here anymore, so its
value is essenitally random. I'd propose to use designated initializers
for `data` to fix this:

    struct reflog_migration_data data = {
        .refname = refname,
        .old_refs = migration_data->old_refs,
        .transaction = migration_data->transaction,
        .errbuf = migration_data->errbuf,
        .sb = &migration_data->sb,
    };

Maybe that fixes the issue that Junio has seen?

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/8] refs: add reflog support to `git refs migrate`
  2024-12-15 23:54     ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Junio C Hamano
@ 2024-12-16 14:33       ` karthik nayak
  2024-12-16 16:32         ` Junio C Hamano
  0 siblings, 1 reply; 93+ messages in thread
From: karthik nayak @ 2024-12-16 14:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Overall, this series is a bit more involved, and I would appreciate it
>> if it receives a bit more scrutiny.
>>
>> The series is based on top of e66fd72e97 (The fourteenth batch,
>> 2024-12-06) with `kn/reftable-writer-log-write-verify` merged in.
>
> t1460.6 does not pass for me.  I noticed it after I merged it to
> 'seen', but it fails standalone as well.
>

Thanks Junio, seems like this passes on GCC and that is what I was
using. Sadly, it also passes on older clang version, which is what the
CI uses. Unfortunately I assumed that the CI passing [1] should be
validation enough. But I can indeed reproduce this locally with clang.

Patrick posted a fix on the list [1] and also discovered one more while
we were discussing off the list. I'll send in the next version with both
of those included once I validate all the tests once more.

[1]: https://gitlab.com/gitlab-org/git/-/pipelines/1589854339
[2]: https://lore.kernel.org/r/Z1_KzlKc7RBfas4L@pks.im

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 8/8] refs: add support for migrating reflogs
  2024-12-16  7:25       ` Patrick Steinhardt
@ 2024-12-16 15:50         ` Junio C Hamano
  2024-12-16 15:59           ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Junio C Hamano @ 2024-12-16 15:50 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

>> +static int migrate_one_reflog(const char *refname, void *cb_data)
>> +{
>> +	struct migration_data *migration_data = cb_data;
>> +	struct reflog_migration_data data;
>> +
>> +	data.refname = refname;
>> +	data.old_refs = migration_data->old_refs;
>> +	data.transaction = migration_data->transaction;
>> +	data.errbuf = migration_data->errbuf;
>> +	data.sb = &migration_data->sb;
>
> The `index` variable isn't getting initialized here anymore, so its
> value is essenitally random. I'd propose to use designated initializers
> for `data` to fix this:
>
>     struct reflog_migration_data data = {
>         .refname = refname,
>         .old_refs = migration_data->old_refs,
>         .transaction = migration_data->transaction,
>         .errbuf = migration_data->errbuf,
>         .sb = &migration_data->sb,
>     };

GOod.  As long as it is sensible to null-initialize the relevant
field and all the other fields not mentioned above, that certainly
would give us more predicitable behaviour ;-).  I do not offhand
know if 0 is the right value to initialize the .index member with,
though; didn't you two recently had an exchange about starting with
0 or 1 or something?

Thanks.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 8/8] refs: add support for migrating reflogs
  2024-12-16 15:50         ` Junio C Hamano
@ 2024-12-16 15:59           ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-16 15:59 UTC (permalink / raw)
  To: Junio C Hamano, Patrick Steinhardt; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>>> +static int migrate_one_reflog(const char *refname, void *cb_data)
>>> +{
>>> +	struct migration_data *migration_data = cb_data;
>>> +	struct reflog_migration_data data;
>>> +
>>> +	data.refname = refname;
>>> +	data.old_refs = migration_data->old_refs;
>>> +	data.transaction = migration_data->transaction;
>>> +	data.errbuf = migration_data->errbuf;
>>> +	data.sb = &migration_data->sb;
>>
>> The `index` variable isn't getting initialized here anymore, so its
>> value is essenitally random. I'd propose to use designated initializers
>> for `data` to fix this:
>>
>>     struct reflog_migration_data data = {
>>         .refname = refname,
>>         .old_refs = migration_data->old_refs,
>>         .transaction = migration_data->transaction,
>>         .errbuf = migration_data->errbuf,
>>         .sb = &migration_data->sb,
>>     };
>
> GOod.  As long as it is sensible to null-initialize the relevant
> field and all the other fields not mentioned above, that certainly
> would give us more predicitable behaviour ;-).  I do not offhand
> know if 0 is the right value to initialize the .index member with,
> though; didn't you two recently had an exchange about starting with
> 0 or 1 or something?
>

We did [1] indeed, context is that I set it to '1' to distinguish
between indexed and non-indexed updates. But it wasn't logically needed
and was confusing so I decided to remove that change (which caused the
issue here!).

[1]: https://lore.kernel.org/r/CAOLa=ZQ9SHD3gzTVaznGhkCBjrrJbHm1fDyi1F-h6VZvtdpxgw@mail.gmail.com

> Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v3 0/8] refs: add reflog support to `git refs migrate`
  2024-12-16 14:33       ` karthik nayak
@ 2024-12-16 16:32         ` Junio C Hamano
  0 siblings, 0 replies; 93+ messages in thread
From: Junio C Hamano @ 2024-12-16 16:32 UTC (permalink / raw)
  To: karthik nayak; +Cc: git, ps, Christian Couder

karthik nayak <karthik.188@gmail.com> writes:

> Patrick posted a fix on the list [1] and also discovered one more while
> we were discussing off the list. I'll send in the next version with both
> of those included once I validate all the tests once more.

Thanks.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v4 0/8] refs: add reflog support to `git refs migrate`
  2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
                       ` (8 preceding siblings ...)
  2024-12-15 23:54     ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Junio C Hamano
@ 2024-12-16 16:44     ` Karthik Nayak
  2024-12-16 16:44       ` [PATCH v4 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
                         ` (9 more replies)
  9 siblings, 10 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the feature was that it didn't support migrating
repositories which contained reflogs. This isn't a requirement on the
server side as repositories are stored as bare repositories (which do
not contain any reflogs). Clients however generally use reflogs and
until now couldn't use the `git refs migrate` command to migrate their
repositories to the new reftable format.

One of the issues for adding reflog support is that the ref transactions
don't support reflogs additions:
  1. While there is REF_LOG_ONLY flag, there is no function to utilize
  the flag and add reflogs.
  2. reference backends generally sort the updates by the refname. This
  wouldn't work for reflogs which need to ensure that they maintain the
  order of creation.
  3. In the files backend, reflog entries are added by obtaining locks
  on the refs themselves. This means each update in the transaction, will
  obtain a ref_lock. This paradigm fails to accompany the fact that there
  could be multiple reflog updates for a refname in a single transaction.
  4. The backends check for duplicate entries, which doesn't make sense
  in the context of adding multiple reflogs for a given refname.

We overcome these issue we make the following changes:
  - Update the ref_update structure to also include the committer
  information. Using this, we can add a new function which only adds
  reflog updates to the transaction.
  - Add an index field to the ref_update structure, this will help order
  updates in pre-defined order, this fixes #2.
  - While the ideal fix for #3 would be to actually introduce reflog
  locks, this wouldn't be possible without breaking backward
  compatibility. So we add a count field to the existing ref_lock. With
  this, multiple reflog updates can share a single ref_lock.

Overall, this series is a bit more involved, and I would appreciate it
if it receives a bit more scrutiny.

The series is based on top of e66fd72e97 (The fourteenth batch,
2024-12-06) with `kn/reftable-writer-log-write-verify` merged in.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v4:
- Fix broken tests, due to two reasons in patch 8/8:
  - The `index` field in `reflog_migration_data` wasn't initialized to
    0. This specifically doesn't break the test, but causes undefined
    behavior. Fix this by using designated initializers.
  - The strbuf within `migration_data` wasn't initialized when the flow
    exited early, causing memory leaks. Fix this too by using designated
    initializers.
- Thanks to Junio for reporting and Patrick for helping shed some light
  on these broken tests.
- Link to v3: https://lore.kernel.org/r/20241215-320-git-refs-migrate-reflogs-v3-0-4127fe707b98@gmail.com

Changes in v3:
- patch 5: Use `xstrdup_or_null` unconditionally.
- patch 6: In `transaction_refname_valid()` use the transaction flags
  to identify reflogs. Update the documentation to also mention the
  purpose of the `index` field.
- patch 8: Instead of allocating an strbuf for each reflog entry, we
  store and re-use one in the migration callback data.
- patch 8: Don't use a global index increment for all reflogs entries,
  instead create and use one per reflog.
- patch 8: Avoid setting the first reflog index to `1`. This would default
  to `0` as the first index, which is okay, since the index is incremented
  for consequtive reflog entries.
- Small typo fixes.
- Thanks to Christian and Patrick for the review!
- Link to v2: https://lore.kernel.org/all/20241213-320-git-refs-migrate-reflogs-v2-0-f28312cdb6c0@gmail.com/

Changes in v2:
- Split patch 5 into two separate patches. This should make it easier to
  review and reduce cognitive load in a single patch.
- In reftable backend, instead of using `strmapint` to ensure we have
  new update_indexes for reflogs with the same refname, we now use the
  already available `update->index` field to increment the update_index.
- Cleanup the code and follow some of the better practices.
- Add some clarity to the commit messages.
- Link to v1: https://lore.kernel.org/r/20241209-320-git-refs-migrate-reflogs-v1-0-d4bc37ee860f@gmail.com

---
Karthik Nayak (8):
      refs: include committer info in `ref_update` struct
      refs: add `index` field to `struct ref_udpate`
      refs/files: add count field to ref_lock
      refs: extract out refname verification in transactions
      refs: add `committer_info` to `ref_transaction_add_update()`
      refs: introduce the `ref_transaction_update_reflog` function
      refs: allow multiple reflog entries for the same refname
      refs: add support for migrating reflogs

 Documentation/git-refs.txt |   2 -
 refs.c                     | 168 +++++++++++++++++++++++++++++++++------------
 refs.h                     |  14 ++++
 refs/files-backend.c       | 131 +++++++++++++++++++++++------------
 refs/refs-internal.h       |   9 +++
 refs/reftable-backend.c    |  53 +++++++++++---
 t/t1460-refs-migrate.sh    |  73 ++++++++++++++------
 7 files changed, 329 insertions(+), 121 deletions(-)
---

Range-diff versus v3:

1:  7989ca0679 = 1:  34fb6a475e refs: include committer info in `ref_update` struct
2:  12acd7b4bb = 2:  4badd2b8ec refs: add `index` field to `struct ref_udpate`
3:  0d13c0d09b = 3:  0d8673c2fe refs/files: add count field to ref_lock
4:  d4073cd9dc = 4:  dec4b3c6f6 refs: extract out refname verification in transactions
5:  5a3d242955 = 5:  b1cc1bb242 refs: add `committer_info` to `ref_transaction_add_update()`
6:  fedd93f113 = 6:  f669e87498 refs: introduce the `ref_transaction_update_reflog` function
7:  b4465ee0c5 = 7:  53c0d2b62b refs: allow multiple reflog entries for the same refname
8:  8760610904 ! 8:  1cf30a5c3a refs: add support for migrating reflogs
    @@ Commit message
         the reflogs from the old reference backend. This is to ensure that the
         order is maintained in the new backend.
     
    +    Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## Documentation/git-refs.txt ##
    @@ refs.c: static int migrate_one_ref(const char *refname, const char *referent UNU
     +static int migrate_one_reflog(const char *refname, void *cb_data)
     +{
     +	struct migration_data *migration_data = cb_data;
    -+	struct reflog_migration_data data;
    -+
    -+	data.refname = refname;
    -+	data.old_refs = migration_data->old_refs;
    -+	data.transaction = migration_data->transaction;
    -+	data.errbuf = migration_data->errbuf;
    -+	data.sb = &migration_data->sb;
    ++	struct reflog_migration_data data = {
    ++		.refname = refname,
    ++		.old_refs = migration_data->old_refs,
    ++		.transaction = migration_data->transaction,
    ++		.errbuf = migration_data->errbuf,
    ++		.sb = &migration_data->sb,
    ++	};
     +
     +	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
     +					migrate_one_reflog_entry, &data);
    @@ refs.c: static int move_files(const char *from_path, const char *to_path, struct
      {
      	struct worktree **worktrees = get_worktrees();
     @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
    + 	struct ref_store *old_refs = NULL, *new_refs = NULL;
      	struct ref_transaction *transaction = NULL;
      	struct strbuf new_gitdir = STRBUF_INIT;
    - 	struct migration_data data;
    +-	struct migration_data data;
     -	size_t reflog_count = 0;
    ++	struct migration_data data = {
    ++		.sb = STRBUF_INIT,
    ++	};
      	int did_migrate_refs = 0;
      	int ret;
      
    @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
      	 */
      	strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
      	if (!mkdtemp(new_gitdir.buf)) {
    -@@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
    - 	data.old_refs = old_refs;
    - 	data.transaction = transaction;
    - 	data.errbuf = errbuf;
    -+	strbuf_init(&data.sb, 0);
    - 
    - 	/*
    - 	 * We need to use the internal `do_for_each_ref()` here so that we can
     @@ refs.c: int repo_migrate_ref_storage_format(struct repository *repo,
      	if (ret < 0)
      		goto done;


--- 

base-commit: 09245f4b75863f4e94dac7feebaafce53a26965f
change-id: 20241111-320-git-refs-migrate-reflogs-a53e3a6cffc9

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v4 1/8] refs: include committer info in `ref_update` struct
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-16 16:44       ` [PATCH v4 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
                         ` (8 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reference backends obtain the committer information from
`git_committer_info(0)` when adding a reflog. The upcoming patches
introduce support for migrating reflogs between the reference backends.
This requires an interface to creating reflogs, including custom
committer information.

Add a new field `committer_info` to the `ref_update` struct, which is
then used by the reference backends. If there is no `committer_info`
provided, the reference backends default to using
`git_committer_info(0)`. The field itself cannot be set to
`git_committer_info(0)` since the values are dynamic and must be
obtained right when the reflog is being committed.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  1 +
 refs/files-backend.c    | 24 ++++++++++++++----------
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 12 +++++++++++-
 4 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/refs.c b/refs.c
index 762f3e324d59c60cd4f05c2f257e54de8deb00e5..f003e51c6bf5229bfbce8ce61ffad7cdba0572e0 100644
--- a/refs.c
+++ b/refs.c
@@ -1151,6 +1151,7 @@ void ref_transaction_free(struct ref_transaction *transaction)
 
 	for (i = 0; i < transaction->nr; i++) {
 		free(transaction->updates[i]->msg);
+		free(transaction->updates[i]->committer_info);
 		free((char *)transaction->updates[i]->new_target);
 		free((char *)transaction->updates[i]->old_target);
 		free(transaction->updates[i]);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 64f51f0da905a9a8a1ac4109c6b0a9a85a355db7..6078668c99ee254e794e3ba49689aa34e6022efd 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1858,6 +1858,9 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 	struct strbuf sb = STRBUF_INIT;
 	int ret = 0;
 
+	if (!committer)
+		committer = git_committer_info(0);
+
 	strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
 	if (msg && *msg) {
 		strbuf_addch(&sb, '\t');
@@ -1871,8 +1874,10 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 }
 
 static int files_log_ref_write(struct files_ref_store *refs,
-			       const char *refname, const struct object_id *old_oid,
-			       const struct object_id *new_oid, const char *msg,
+			       const char *refname,
+			       const struct object_id *old_oid,
+			       const struct object_id *new_oid,
+			       const char *committer_info, const char *msg,
 			       int flags, struct strbuf *err)
 {
 	int logfd, result;
@@ -1889,8 +1894,7 @@ static int files_log_ref_write(struct files_ref_store *refs,
 
 	if (logfd < 0)
 		return 0;
-	result = log_ref_write_fd(logfd, old_oid, new_oid,
-				  git_committer_info(0), msg);
+	result = log_ref_write_fd(logfd, old_oid, new_oid, committer_info, msg);
 	if (result) {
 		struct strbuf sb = STRBUF_INIT;
 		int save_errno = errno;
@@ -1974,8 +1978,7 @@ static int commit_ref_update(struct files_ref_store *refs,
 	files_assert_main_repository(refs, "commit_ref_update");
 
 	clear_loose_ref_cache(refs);
-	if (files_log_ref_write(refs, lock->ref_name,
-				&lock->old_oid, oid,
+	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid, oid, NULL,
 				logmsg, flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 		strbuf_addf(err, "cannot update the ref '%s': %s",
@@ -2007,9 +2010,9 @@ static int commit_ref_update(struct files_ref_store *refs,
 		if (head_ref && (head_flag & REF_ISSYMREF) &&
 		    !strcmp(head_ref, lock->ref_name)) {
 			struct strbuf log_err = STRBUF_INIT;
-			if (files_log_ref_write(refs, "HEAD",
-						&lock->old_oid, oid,
-						logmsg, flags, &log_err)) {
+			if (files_log_ref_write(refs, "HEAD", &lock->old_oid,
+						oid, NULL, logmsg, flags,
+						&log_err)) {
 				error("%s", log_err.buf);
 				strbuf_release(&log_err);
 			}
@@ -2969,7 +2972,8 @@ static int parse_and_write_reflog(struct files_ref_store *refs,
 	}
 
 	if (files_log_ref_write(refs, lock->ref_name, &lock->old_oid,
-				&update->new_oid, update->msg, update->flags, err)) {
+				&update->new_oid, update->committer_info,
+				update->msg, update->flags, err)) {
 		char *old_msg = strbuf_detach(err, NULL);
 
 		strbuf_addf(err, "cannot update the ref '%s': %s",
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 58aa56d1b27c85d606ed7c8c0d908e4b87d1066b..0fd95cdacd99e4a728c22f5286f6b3f0f360c110 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -113,6 +113,7 @@ struct ref_update {
 	void *backend_data;
 	unsigned int type;
 	char *msg;
+	char *committer_info;
 
 	/*
 	 * If this ref_update was split off of a symref update via
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 647ef9b05b1dc9a376ed054330b487f7595c5caa..e882602487c66261d586a94101bb1b4e9a2ed60e 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1379,11 +1379,21 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			}
 
 			if (create_reflog) {
+				struct ident_split c;
+
 				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
 				log = &logs[logs_nr++];
 				memset(log, 0, sizeof(*log));
 
-				fill_reftable_log_record(log, &committer_ident);
+				if (u->committer_info) {
+					if (split_ident_line(&c, u->committer_info,
+							     strlen(u->committer_info)))
+						BUG("failed splitting committer info");
+				} else {
+					c = committer_ident;
+				}
+
+				fill_reftable_log_record(log, &c);
 				log->update_index = ts;
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 2/8] refs: add `index` field to `struct ref_udpate`
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
  2024-12-16 16:44       ` [PATCH v4 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-19 19:28         ` Toon Claes
  2024-12-16 16:44       ` [PATCH v4 3/8] refs/files: add count field to ref_lock Karthik Nayak
                         ` (7 subsequent siblings)
  9 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reftable backend, sorts its updates by refname before applying them,
this ensures that the references are stored sorted. When migrating
reflogs from one backend to another, the order of the reflogs must be
maintained. Add a new `index` field to the `ref_update` struct to
facilitate this.

This field is used in the reftable backend's sort comparison function
`transaction_update_cmp`, to ensure that indexed fields maintain their
order.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/refs-internal.h    |  7 +++++++
 refs/reftable-backend.c | 13 +++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 0fd95cdacd99e4a728c22f5286f6b3f0f360c110..f5c733d099f0c6f1076a25f4f77d9d5eb345ec87 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -115,6 +115,13 @@ struct ref_update {
 	char *msg;
 	char *committer_info;
 
+	/*
+	 * The index overrides the default sort algorithm. This is needed
+	 * when migrating reflogs and we want to ensure we carry over the
+	 * same order.
+	 */
+	unsigned int index;
+
 	/*
 	 * If this ref_update was split off of a symref update via
 	 * split_symref_update(), then this member points at that
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index e882602487c66261d586a94101bb1b4e9a2ed60e..c008f20be719fec3af6a8f81c821cb9c263764d7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1279,8 +1279,17 @@ static int reftable_be_transaction_abort(struct ref_store *ref_store UNUSED,
 
 static int transaction_update_cmp(const void *a, const void *b)
 {
-	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
-		      ((struct reftable_transaction_update *)b)->update->refname);
+	struct reftable_transaction_update *update_a = (struct reftable_transaction_update *)a;
+	struct reftable_transaction_update *update_b = (struct reftable_transaction_update *)b;
+
+	/*
+	 * If there is an index set, it should take preference (default is 0).
+	 * This ensures that updates with indexes are sorted amongst themselves.
+	 */
+	if (update_a->update->index || update_b->update->index)
+		return update_a->update->index - update_b->update->index;
+
+	return strcmp(update_a->update->refname, update_b->update->refname);
 }
 
 static int write_transaction_table(struct reftable_writer *writer, void *cb_data)

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 3/8] refs/files: add count field to ref_lock
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
  2024-12-16 16:44       ` [PATCH v4 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
  2024-12-16 16:44       ` [PATCH v4 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-16 16:44       ` [PATCH v4 4/8] refs: extract out refname verification in transactions Karthik Nayak
                         ` (6 subsequent siblings)
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

When refs are updated in the files-backend, a lock is obtained for the
corresponding file path. This is the case even for reflogs, i.e. a lock
is obtained on the reference path instead of the reflog path. This
works, since generally, reflogs are updated alongside the ref.

The upcoming patches will add support for reflog updates in ref
transaction. This means, in a particular transaction we want to have ref
updates and reflog updates. For a given ref in a given transaction there
can be at most one update. But we can theoretically have multiple reflog
updates for a given ref in a given transaction. A great example of this
would be when migrating reflogs from one backend to another. There we
would batch all the reflog updates for a given reference in a single
transaction.

The current flow does not support this, because currently refs & reflogs
are treated as a single entity and capture the lock together. To
separate this, add a count field to ref_lock. With this, multiple
updates can hold onto a single ref_lock and the lock will only be
released when all of them release the lock.

This patch only adds the `count` field to `ref_lock` and adds the logic
to increment and decrement the lock. In a follow up commit, we'll
separate the reflog update logic from ref updates and utilize this
functionality.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c | 58 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6078668c99ee254e794e3ba49689aa34e6022efd..02cb4907d8659e87a227fed4f60a5f6606be8764 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -71,6 +71,7 @@ struct ref_lock {
 	char *ref_name;
 	struct lock_file lk;
 	struct object_id old_oid;
+	unsigned int count; /* track users of the lock (ref update + reflog updates) */
 };
 
 struct files_ref_store {
@@ -638,9 +639,12 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 
 static void unlock_ref(struct ref_lock *lock)
 {
-	rollback_lock_file(&lock->lk);
-	free(lock->ref_name);
-	free(lock);
+	lock->count--;
+	if (!lock->count) {
+		rollback_lock_file(&lock->lk);
+		free(lock->ref_name);
+		free(lock);
+	}
 }
 
 /*
@@ -696,6 +700,7 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	*lock_p = CALLOC_ARRAY(lock, 1);
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 	files_ref_path(refs, &ref_file, refname);
 
 retry:
@@ -1169,6 +1174,7 @@ static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
 		goto error_return;
 
 	lock->ref_name = xstrdup(refname);
+	lock->count = 1;
 
 	if (raceproof_create_file(ref_file.buf, create_reflock, &lock->lk)) {
 		unable_to_lock_message(ref_file.buf, errno, err);
@@ -2535,6 +2541,12 @@ static int check_old_oid(struct ref_update *update, struct object_id *oid,
 	return -1;
 }
 
+struct files_transaction_backend_data {
+	struct ref_transaction *packed_transaction;
+	int packed_refs_locked;
+	struct strmap ref_locks;
+};
+
 /*
  * Prepare for carrying out update:
  * - Lock the reference referred to by update.
@@ -2557,11 +2569,14 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 {
 	struct strbuf referent = STRBUF_INIT;
 	int mustexist = ref_update_expects_existing_old_ref(update);
+	struct files_transaction_backend_data *backend_data;
 	int ret = 0;
 	struct ref_lock *lock;
 
 	files_assert_main_repository(refs, "lock_ref_for_update");
 
+	backend_data = transaction->backend_data;
+
 	if ((update->flags & REF_HAVE_NEW) && ref_update_has_null_new_value(update))
 		update->flags |= REF_DELETING;
 
@@ -2572,18 +2587,25 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			goto out;
 	}
 
-	ret = lock_raw_ref(refs, update->refname, mustexist,
-			   affected_refnames,
-			   &lock, &referent,
-			   &update->type, err);
-	if (ret) {
-		char *reason;
+	lock = strmap_get(&backend_data->ref_locks, update->refname);
+	if (lock) {
+		lock->count++;
+	} else {
+		ret = lock_raw_ref(refs, update->refname, mustexist,
+				   affected_refnames,
+				   &lock, &referent,
+				   &update->type, err);
+		if (ret) {
+			char *reason;
+
+			reason = strbuf_detach(err, NULL);
+			strbuf_addf(err, "cannot lock ref '%s': %s",
+				    ref_update_original_update_refname(update), reason);
+			free(reason);
+			goto out;
+		}
 
-		reason = strbuf_detach(err, NULL);
-		strbuf_addf(err, "cannot lock ref '%s': %s",
-			    ref_update_original_update_refname(update), reason);
-		free(reason);
-		goto out;
+		strmap_put(&backend_data->ref_locks, update->refname, lock);
 	}
 
 	update->backend_data = lock;
@@ -2730,11 +2752,6 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	return ret;
 }
 
-struct files_transaction_backend_data {
-	struct ref_transaction *packed_transaction;
-	int packed_refs_locked;
-};
-
 /*
  * Unlock any references in `transaction` that are still locked, and
  * mark the transaction closed.
@@ -2767,6 +2784,8 @@ static void files_transaction_cleanup(struct files_ref_store *refs,
 		if (backend_data->packed_refs_locked)
 			packed_refs_unlock(refs->packed_ref_store);
 
+		strmap_clear(&backend_data->ref_locks, 0);
+
 		free(backend_data);
 	}
 
@@ -2796,6 +2815,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		goto cleanup;
 
 	CALLOC_ARRAY(backend_data, 1);
+	strmap_init(&backend_data->ref_locks);
 	transaction->backend_data = backend_data;
 
 	/*

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 4/8] refs: extract out refname verification in transactions
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
                         ` (2 preceding siblings ...)
  2024-12-16 16:44       ` [PATCH v4 3/8] refs/files: add count field to ref_lock Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-19 19:29         ` Toon Claes
  2024-12-16 16:44       ` [PATCH v4 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
                         ` (5 subsequent siblings)
  9 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
the refname of the update is verified for:

  - Ensuring it is not a pseudoref.
  - Checking the refname format.

These checks will also be needed in a following commit where the
function to add reflog updates to the transaction is introduced. Extract
the code out into a new static function.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/refs.c b/refs.c
index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801 100644
--- a/refs.c
+++ b/refs.c
@@ -1196,6 +1196,28 @@ struct ref_update *ref_transaction_add_update(
 	return update;
 }
 
+static int transaction_refname_valid(const char *refname,
+				     const struct object_id *new_oid,
+				     unsigned int flags, struct strbuf *err)
+{
+	if (flags & REF_SKIP_REFNAME_VERIFICATION)
+		return 1;
+
+	if (is_pseudo_ref(refname)) {
+		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
+			    refname);
+		return 0;
+	} else if ((new_oid && !is_null_oid(new_oid)) ?
+		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
+		 !refname_is_safe(refname)) {
+		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
+			    refname);
+		return 0;
+	}
+
+	return 1;
+}
+
 int ref_transaction_update(struct ref_transaction *transaction,
 			   const char *refname,
 			   const struct object_id *new_oid,
@@ -1213,21 +1235,8 @@ int ref_transaction_update(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    ((new_oid && !is_null_oid(new_oid)) ?
-		     check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
-			   !refname_is_safe(refname))) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
+	if (!transaction_refname_valid(refname, new_oid, flags, err))
 		return -1;
-	}
-
-	if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
-	    is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
-		return -1;
-	}
 
 	if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
 		BUG("illegal flags 0x%x passed to ref_transaction_update()", flags);

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
                         ` (3 preceding siblings ...)
  2024-12-16 16:44       ` [PATCH v4 4/8] refs: extract out refname verification in transactions Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-19 19:30         ` Toon Claes
  2024-12-16 16:44       ` [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
                         ` (4 subsequent siblings)
  9 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `ref_transaction_add_update()` creates the `ref_update` struct. To
facilitate addition of reflogs in the next commit, the function needs to
accommodate setting the `committer_info` field in the struct. So modify
the function to also take `committer_info` as an argument and set it
accordingly.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c                  |  7 +++++--
 refs/files-backend.c    | 14 ++++++++------
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c |  6 ++++--
 4 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801..782bf1090af65196263a3c35ed18d878bb4f2967 100644
--- a/refs.c
+++ b/refs.c
@@ -1166,6 +1166,7 @@ struct ref_update *ref_transaction_add_update(
 		const struct object_id *new_oid,
 		const struct object_id *old_oid,
 		const char *new_target, const char *old_target,
+		const char *committer_info,
 		const char *msg)
 {
 	struct ref_update *update;
@@ -1190,8 +1191,10 @@ struct ref_update *ref_transaction_add_update(
 		oidcpy(&update->new_oid, new_oid);
 	if ((flags & REF_HAVE_OLD) && old_oid)
 		oidcpy(&update->old_oid, old_oid);
-	if (!(flags & REF_SKIP_CREATE_REFLOG))
+	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
+		update->committer_info = xstrdup_or_null(committer_info);
 		update->msg = normalize_reflog_message(msg);
+	}
 
 	return update;
 }
@@ -1253,7 +1256,7 @@ int ref_transaction_update(struct ref_transaction *transaction,
 
 	ref_transaction_add_update(transaction, refname, flags,
 				   new_oid, old_oid, new_target,
-				   old_target, msg);
+				   old_target, NULL, msg);
 	return 0;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 02cb4907d8659e87a227fed4f60a5f6606be8764..255fed8354cae982f785b1b85340e2a1eeecf2a6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1270,7 +1270,7 @@ static void prune_ref(struct files_ref_store *refs, struct ref_to_prune *r)
 	ref_transaction_add_update(
 			transaction, r->name,
 			REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | REF_IS_PRUNING,
-			null_oid(), &r->oid, NULL, NULL, NULL);
+			null_oid(), &r->oid, NULL, NULL, NULL, NULL);
 	if (ref_transaction_commit(transaction, &err))
 		goto cleanup;
 
@@ -2417,7 +2417,7 @@ static int split_head_update(struct ref_update *update,
 			transaction, "HEAD",
 			update->flags | REF_LOG_ONLY | REF_NO_DEREF,
 			&update->new_oid, &update->old_oid,
-			NULL, NULL, update->msg);
+			NULL, NULL, update->committer_info, update->msg);
 
 	/*
 	 * Add "HEAD". This insertion is O(N) in the transaction
@@ -2481,7 +2481,8 @@ static int split_symref_update(struct ref_update *update,
 			transaction, referent, new_flags,
 			update->new_target ? NULL : &update->new_oid,
 			update->old_target ? NULL : &update->old_oid,
-			update->new_target, update->old_target, update->msg);
+			update->new_target, update->old_target, NULL,
+			update->msg);
 
 	new_update->parent_update = update;
 
@@ -2914,7 +2915,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 					packed_transaction, update->refname,
 					REF_HAVE_NEW | REF_NO_DEREF,
 					&update->new_oid, NULL,
-					NULL, NULL, NULL);
+					NULL, NULL, NULL, NULL);
 		}
 	}
 
@@ -3094,12 +3095,13 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 			ref_transaction_add_update(loose_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,
 						   update->new_target ? NULL : &update->new_oid, NULL,
-						   update->new_target, NULL, NULL);
+						   update->new_target, NULL, update->committer_info,
+						   NULL);
 		} else {
 			ref_transaction_add_update(packed_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,
 						   &update->new_oid, &update->old_oid,
-						   NULL, NULL, NULL);
+						   NULL, NULL, update->committer_info, NULL);
 		}
 	}
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f5c733d099f0c6f1076a25f4f77d9d5eb345ec87..79b287c5ec5c7d8f759869cf93cda405640186dc 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -162,6 +162,7 @@ struct ref_update *ref_transaction_add_update(
 		const struct object_id *new_oid,
 		const struct object_id *old_oid,
 		const char *new_target, const char *old_target,
+		const char *committer_info,
 		const char *msg);
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c008f20be719fec3af6a8f81c821cb9c263764d7..b2e3ba877de9e59fea5a4d066eb13e60ef22a32b 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1078,7 +1078,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			new_update = ref_transaction_add_update(
 					transaction, "HEAD",
 					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
-					&u->new_oid, &u->old_oid, NULL, NULL, u->msg);
+					&u->new_oid, &u->old_oid, NULL, NULL, NULL,
+					u->msg);
 			string_list_insert(&affected_refnames, new_update->refname);
 		}
 
@@ -1161,7 +1162,8 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 					transaction, referent.buf, new_flags,
 					u->new_target ? NULL : &u->new_oid,
 					u->old_target ? NULL : &u->old_oid,
-					u->new_target, u->old_target, u->msg);
+					u->new_target, u->old_target,
+					u->committer_info, u->msg);
 
 				new_update->parent_update = u;
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
                         ` (4 preceding siblings ...)
  2024-12-16 16:44       ` [PATCH v4 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-19 19:32         ` Toon Claes
  2024-12-16 16:44       ` [PATCH v4 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
                         ` (3 subsequent siblings)
  9 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

Introduce a new function `ref_transaction_update_reflog`, for clients to
add a reflog update to a transaction. While the existing function
`ref_transaction_update` also allows clients to add a reflog entry, this
function does a few things more, It:
  - Enforces that only a reflog entry is added and does not update the
  ref itself.
  - Allows the users to also provide the committer information. This
  means clients can add reflog entries with custom committer
  information.

The `transaction_refname_valid()` function also modifies the error
message selectively based on the type of the update. This change also
affects reflog updates which go through `ref_transaction_update()`.

A follow up commit will utilize this function to add reflog support to
`git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c               | 39 +++++++++++++++++++++++++++++++++++----
 refs.h               | 14 ++++++++++++++
 refs/files-backend.c | 24 ++++++++++++++++--------
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/refs.c b/refs.c
index 782bf1090af65196263a3c35ed18d878bb4f2967..8b3882cff17e5e3b0376f75654e32f81a23e5cb2 100644
--- a/refs.c
+++ b/refs.c
@@ -1207,14 +1207,14 @@ static int transaction_refname_valid(const char *refname,
 		return 1;
 
 	if (is_pseudo_ref(refname)) {
-		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
-			    refname);
+		const char *what = flags & REF_LOG_ONLY ? "reflog for pseudoref" : "pseudoref";
+		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
 		return 0;
 	} else if ((new_oid && !is_null_oid(new_oid)) ?
 		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
 		 !refname_is_safe(refname)) {
-		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
-			    refname);
+		const char *what = flags & REF_LOG_ONLY ? "reflog with bad name" : "ref with bad name";
+		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
 		return 0;
 	}
 
@@ -1257,6 +1257,37 @@ int ref_transaction_update(struct ref_transaction *transaction,
 	ref_transaction_add_update(transaction, refname, flags,
 				   new_oid, old_oid, new_target,
 				   old_target, NULL, msg);
+
+	return 0;
+}
+
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err)
+{
+	struct ref_update *update;
+
+	assert(err);
+
+	flags |= REF_LOG_ONLY | REF_NO_DEREF;
+
+	if (!transaction_refname_valid(refname, new_oid, flags, err))
+		return -1;
+
+	update = ref_transaction_add_update(transaction, refname, flags,
+					    new_oid, old_oid, NULL, NULL,
+					    committer_info, msg);
+	/*
+	 * While we do set the old_oid value, we unset the flag to skip
+	 * old_oid verification which only makes sense for refs.
+	 */
+	update->flags &= ~REF_HAVE_OLD;
+	update->index = index;
+
 	return 0;
 }
 
diff --git a/refs.h b/refs.h
index a5bedf48cf6de91005a7e8d0bf58ca98350397a6..b0dfc65ed2e59c4b66967840339f81e7746a96d3 100644
--- a/refs.h
+++ b/refs.h
@@ -727,6 +727,20 @@ int ref_transaction_update(struct ref_transaction *transaction,
 			   unsigned int flags, const char *msg,
 			   struct strbuf *err);
 
+/*
+ * Similar to`ref_transaction_update`, but this function is only for adding
+ * a reflog update. Supports providing custom committer information. The index
+ * field can be utiltized to order updates as desired. When not used, the
+ * updates default to being ordered by refname.
+ */
+int ref_transaction_update_reflog(struct ref_transaction *transaction,
+				  const char *refname,
+				  const struct object_id *new_oid,
+				  const struct object_id *old_oid,
+				  const char *committer_info, unsigned int flags,
+				  const char *msg, unsigned int index,
+				  struct strbuf *err);
+
 /*
  * Add a reference creation to transaction. new_oid is the value that
  * the reference should have after the update; it must not be
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 255fed8354cae982f785b1b85340e2a1eeecf2a6..c11213f52065bcf2fa7612df8f9500692ee2d02c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3080,10 +3080,12 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 
 		/*
-		 * packed-refs don't support symbolic refs and root refs, so we
-		 * have to queue these references via the loose transaction.
+		 * packed-refs don't support symbolic refs, root refs and reflogs,
+		 * so we have to queue these references via the loose transaction.
 		 */
-		if (update->new_target || is_root_ref(update->refname)) {
+		if (update->new_target ||
+		    is_root_ref(update->refname) ||
+		    (update->flags & REF_LOG_ONLY)) {
 			if (!loose_transaction) {
 				loose_transaction = ref_store_transaction_begin(&refs->base, 0, err);
 				if (!loose_transaction) {
@@ -3092,11 +3094,17 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 				}
 			}
 
-			ref_transaction_add_update(loose_transaction, update->refname,
-						   update->flags & ~REF_HAVE_OLD,
-						   update->new_target ? NULL : &update->new_oid, NULL,
-						   update->new_target, NULL, update->committer_info,
-						   NULL);
+			if (update->flags & REF_LOG_ONLY)
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags, &update->new_oid,
+							   &update->old_oid, NULL, NULL,
+							   update->committer_info, update->msg);
+			else
+				ref_transaction_add_update(loose_transaction, update->refname,
+							   update->flags & ~REF_HAVE_OLD,
+							   update->new_target ? NULL : &update->new_oid, NULL,
+							   update->new_target, NULL, update->committer_info,
+							   NULL);
 		} else {
 			ref_transaction_add_update(packed_transaction, update->refname,
 						   update->flags & ~REF_HAVE_OLD,

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 7/8] refs: allow multiple reflog entries for the same refname
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
                         ` (5 preceding siblings ...)
  2024-12-16 16:44       ` [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-19 19:33         ` Toon Claes
  2024-12-16 16:44       ` [PATCH v4 8/8] refs: add support for migrating reflogs Karthik Nayak
                         ` (2 subsequent siblings)
  9 siblings, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The reference transaction only allows a single update for a given
reference to avoid conflicts. This, however, isn't an issue for reflogs.
There are no conflicts to be resolved in reflogs and when migrating
reflogs between backends we'd have multiple reflog entries for the same
refname.

So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.

In the reftable backend, the writer sorts all updates based on the
update_index before writing to the block. When there are multiple
reflogs for a given refname, it is essential that the order of the
reflogs is maintained. So add the `index` value to the `update_index`.
The `index` field is only set when multiple reflog entries for a given
refname are added and as such in most scenarios the old behavior
remains.

This is required to add reflog migration support to `git refs migrate`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/files-backend.c    | 15 +++++++++++----
 refs/reftable-backend.c | 22 +++++++++++++++++++---
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
+	if (update->flags & REF_LOG_ONLY)
+		goto out;
+
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
@@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	 */
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *update = transaction->updates[i];
-		struct string_list_item *item =
-			string_list_append(&affected_refnames, update->refname);
+		struct string_list_item *item;
 
 		if ((update->flags & REF_IS_PRUNING) &&
 		    !(update->flags & REF_NO_DEREF))
 			BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
+		if (update->flags & REF_LOG_ONLY)
+			continue;
+
+		item = string_list_append(&affected_refnames, update->refname);
 		/*
 		 * We store a pointer to update in item->util, but at
 		 * the moment we never use the value of this field
@@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 
 	/* Fail if a refname appears more than once in the transaction: */
 	for (i = 0; i < transaction->nr; i++)
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	string_list_sort(&affected_refnames);
 	if (ref_update_reject_duplicates(&affected_refnames, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		if (ret)
 			goto done;
 
-		string_list_append(&affected_refnames,
-				   transaction->updates[i]->refname);
+		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
+			string_list_append(&affected_refnames,
+					   transaction->updates[i]->refname);
 	}
 
 	/*
@@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
+	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				}
 
 				fill_reftable_log_record(log, &c);
-				log->update_index = ts;
+
+				/*
+				 * Updates are sorted by the writer. So updates for the same
+				 * refname need to contain different update indices.
+				 */
+				log->update_index = ts + u->index;
+
+				/*
+				 * Note the max update_index so the limit can be set later on.
+				 */
+				if (log->update_index > max_update_index)
+					max_update_index = log->update_index;
+
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
+		reftable_writer_set_limits(writer, ts, max_update_index);
+
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 8/8] refs: add support for migrating reflogs
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
                         ` (6 preceding siblings ...)
  2024-12-16 16:44       ` [PATCH v4 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
@ 2024-12-16 16:44       ` Karthik Nayak
  2024-12-17  6:59       ` [PATCH v4 0/8] refs: add reflog support to `git refs migrate` Patrick Steinhardt
  2024-12-19 19:32       ` Toon Claes
  9 siblings, 0 replies; 93+ messages in thread
From: Karthik Nayak @ 2024-12-16 16:44 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, ps, Christian Couder

The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.

One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.

To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/git-refs.txt |  2 -
 refs.c                     | 92 ++++++++++++++++++++++++++++++++--------------
 t/t1460-refs-migrate.sh    | 73 +++++++++++++++++++++++++-----------
 3 files changed, 115 insertions(+), 52 deletions(-)

diff --git a/Documentation/git-refs.txt b/Documentation/git-refs.txt
index ce31f93061db5e5d16aca516dd3d15f6527db870..9829984b0a4c4f54ec7f9b6c6c7072f62b1d198d 100644
--- a/Documentation/git-refs.txt
+++ b/Documentation/git-refs.txt
@@ -57,8 +57,6 @@ KNOWN LIMITATIONS
 
 The ref format migration has several known limitations in its current form:
 
-* It is not possible to migrate repositories that have reflogs.
-
 * It is not possible to migrate repositories that have worktrees.
 
 * There is no way to block concurrent writes to the repository during an
diff --git a/refs.c b/refs.c
index 8b3882cff17e5e3b0376f75654e32f81a23e5cb2..5d541ddc41aa84905e688c92565aa18fbf55323b 100644
--- a/refs.c
+++ b/refs.c
@@ -30,6 +30,7 @@
 #include "date.h"
 #include "commit.h"
 #include "wildmatch.h"
+#include "ident.h"
 
 /*
  * List of all available backends
@@ -2673,6 +2674,7 @@ struct migration_data {
 	struct ref_store *old_refs;
 	struct ref_transaction *transaction;
 	struct strbuf *errbuf;
+	struct strbuf sb;
 };
 
 static int migrate_one_ref(const char *refname, const char *referent UNUSED, const struct object_id *oid,
@@ -2705,6 +2707,52 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
 	return ret;
 }
 
+struct reflog_migration_data {
+	unsigned int index;
+	const char *refname;
+	struct ref_store *old_refs;
+	struct ref_transaction *transaction;
+	struct strbuf *errbuf;
+	struct strbuf *sb;
+};
+
+static int migrate_one_reflog_entry(struct object_id *old_oid,
+				    struct object_id *new_oid,
+				    const char *committer,
+				    timestamp_t timestamp, int tz,
+				    const char *msg, void *cb_data)
+{
+	struct reflog_migration_data *data = cb_data;
+	const char *date;
+	int ret;
+
+	date = show_date(timestamp, tz, DATE_MODE(NORMAL));
+	strbuf_reset(data->sb);
+	/* committer contains name and email */
+	strbuf_addstr(data->sb, fmt_ident("", committer, WANT_BLANK_IDENT, date, 0));
+
+	ret = ref_transaction_update_reflog(data->transaction, data->refname,
+					    new_oid, old_oid, data->sb->buf,
+					    REF_HAVE_NEW | REF_HAVE_OLD, msg,
+					    data->index++, data->errbuf);
+	return ret;
+}
+
+static int migrate_one_reflog(const char *refname, void *cb_data)
+{
+	struct migration_data *migration_data = cb_data;
+	struct reflog_migration_data data = {
+		.refname = refname,
+		.old_refs = migration_data->old_refs,
+		.transaction = migration_data->transaction,
+		.errbuf = migration_data->errbuf,
+		.sb = &migration_data->sb,
+	};
+
+	return refs_for_each_reflog_ent(migration_data->old_refs, refname,
+					migrate_one_reflog_entry, &data);
+}
+
 static int move_files(const char *from_path, const char *to_path, struct strbuf *errbuf)
 {
 	struct strbuf from_buf = STRBUF_INIT, to_buf = STRBUF_INIT;
@@ -2771,13 +2819,6 @@ static int move_files(const char *from_path, const char *to_path, struct strbuf
 	return ret;
 }
 
-static int count_reflogs(const char *reflog UNUSED, void *payload)
-{
-	size_t *reflog_count = payload;
-	(*reflog_count)++;
-	return 0;
-}
-
 static int has_worktrees(void)
 {
 	struct worktree **worktrees = get_worktrees();
@@ -2802,8 +2843,9 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	struct ref_store *old_refs = NULL, *new_refs = NULL;
 	struct ref_transaction *transaction = NULL;
 	struct strbuf new_gitdir = STRBUF_INIT;
-	struct migration_data data;
-	size_t reflog_count = 0;
+	struct migration_data data = {
+		.sb = STRBUF_INIT,
+	};
 	int did_migrate_refs = 0;
 	int ret;
 
@@ -2815,21 +2857,6 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 
 	old_refs = get_main_ref_store(repo);
 
-	/*
-	 * We do not have any interfaces that would allow us to write many
-	 * reflog entries. Once we have them we can remove this restriction.
-	 */
-	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
-		strbuf_addstr(errbuf, "cannot count reflogs");
-		ret = -1;
-		goto done;
-	}
-	if (reflog_count) {
-		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
-		ret = -1;
-		goto done;
-	}
-
 	/*
 	 * Worktrees complicate the migration because every worktree has a
 	 * separate ref storage. While it should be feasible to implement, this
@@ -2855,17 +2882,21 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	 *      This operation is safe as we do not yet modify the main
 	 *      repository.
 	 *
-	 *   3. If we're in dry-run mode then we are done and can hand over the
+	 *   3. Enumerate all reflogs and write them into the new ref storage.
+	 *      This operation is safe as we do not yet modify the main
+	 *      repository.
+	 *
+	 *   4. If we're in dry-run mode then we are done and can hand over the
 	 *      directory to the caller for inspection. If not, we now start
 	 *      with the destructive part.
 	 *
-	 *   4. Delete the old ref storage from disk. As we have a copy of refs
+	 *   5. Delete the old ref storage from disk. As we have a copy of refs
 	 *      in the new ref storage it's okay(ish) if we now get interrupted
 	 *      as there is an equivalent copy of all refs available.
 	 *
-	 *   5. Move the new ref storage files into place.
+	 *   6. Move the new ref storage files into place.
 	 *
-	 *   6. Change the repository format to the new ref format.
+	 *  7. Change the repository format to the new ref format.
 	 */
 	strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
 	if (!mkdtemp(new_gitdir.buf)) {
@@ -2907,6 +2938,10 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	if (ret < 0)
 		goto done;
 
+	ret = refs_for_each_reflog(old_refs, migrate_one_reflog, &data);
+	if (ret < 0)
+		goto done;
+
 	ret = ref_transaction_commit(transaction, errbuf);
 	if (ret < 0)
 		goto done;
@@ -2982,6 +3017,7 @@ int repo_migrate_ref_storage_format(struct repository *repo,
 	}
 	ref_transaction_free(transaction);
 	strbuf_release(&new_gitdir);
+	strbuf_release(&data.sb);
 	return ret;
 }
 
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index 1bfff3a7afd5acc470424dfe7ec3e97d45f5c481..f59bc4860f19c4af82dc6f2984bdb69d61fe3ec2 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -7,23 +7,44 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
 
+# Migrate the provided repository from one format to the other and
+# verify that the references and logs are migrated over correctly.
+# Usage: test_migration <repo> <format> <skip_reflog_verify>
+#   <repo> is the relative path to the repo to be migrated.
+#   <format> is the ref format to be migrated to.
+#   <skip_reflog_verify> (true or false) whether to skip reflog verification.
 test_migration () {
-	git -C "$1" for-each-ref --include-root-refs \
+	repo=$1 &&
+	format=$2 &&
+	skip_reflog_verify=${3:-false} &&
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >expect &&
-	git -C "$1" refs migrate --ref-format="$2" &&
-	git -C "$1" for-each-ref --include-root-refs \
+	if ! $skip_reflog_verify
+	then
+	   git -C "$repo" reflog --all >expect_logs &&
+	   git -C "$repo" reflog list >expect_log_list
+	fi &&
+
+	git -C "$repo" refs migrate --ref-format="$2" &&
+
+	git -C "$repo" for-each-ref --include-root-refs \
 		--format='%(refname) %(objectname) %(symref)' >actual &&
 	test_cmp expect actual &&
+	if ! $skip_reflog_verify
+	then
+		git -C "$repo" reflog --all >actual_logs &&
+		git -C "$repo" reflog list >actual_log_list &&
+		test_cmp expect_logs actual_logs &&
+		test_cmp expect_log_list actual_log_list
+	fi &&
 
-	git -C "$1" rev-parse --show-ref-format >actual &&
-	echo "$2" >expect &&
+	git -C "$repo" rev-parse --show-ref-format >actual &&
+	echo "$format" >expect &&
 	test_cmp expect actual
 }
 
 test_expect_success 'setup' '
-	rm -rf .git &&
-	# The migration does not yet support reflogs.
-	git config --global core.logAllRefUpdates false
+	rm -rf .git
 '
 
 test_expect_success "superfluous arguments" '
@@ -78,19 +99,6 @@ do
 			test_cmp expect err
 		'
 
-		test_expect_success "$from_format -> $to_format: migration with reflog fails" '
-			test_when_finished "rm -rf repo" &&
-			git init --ref-format=$from_format repo &&
-			test_config -C repo core.logAllRefUpdates true &&
-			test_commit -C repo logged &&
-			test_must_fail git -C repo refs migrate \
-				--ref-format=$to_format 2>err &&
-			cat >expect <<-EOF &&
-			error: migrating reflogs is not supported yet
-			EOF
-			test_cmp expect err
-		'
-
 		test_expect_success "$from_format -> $to_format: migration with worktree fails" '
 			test_when_finished "rm -rf repo" &&
 			git init --ref-format=$from_format repo &&
@@ -141,7 +149,7 @@ do
 			test_commit -C repo initial &&
 			test-tool -C repo ref-store main update-ref "" refs/heads/broken \
 				"$(test_oid 001)" "$ZERO_OID" REF_SKIP_CREATE_REFLOG,REF_SKIP_OID_VERIFICATION &&
-			test_migration repo "$to_format" &&
+			test_migration repo "$to_format" true &&
 			test_oid 001 >expect &&
 			git -C repo rev-parse refs/heads/broken >actual &&
 			test_cmp expect actual
@@ -195,6 +203,27 @@ do
 			git -C repo rev-parse --show-ref-format >actual &&
 			test_cmp expect actual
 		'
+
+		test_expect_success "$from_format -> $to_format: reflogs of symrefs with target deleted" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit -C repo initial &&
+			git -C repo branch branch-1 HEAD &&
+			git -C repo symbolic-ref refs/heads/symref refs/heads/branch-1 &&
+			cat >input <<-EOF &&
+			delete refs/heads/branch-1
+			EOF
+			git -C repo update-ref --stdin <input &&
+			test_migration repo "$to_format"
+		'
+
+		test_expect_success "$from_format -> $to_format: reflogs order is retained" '
+			test_when_finished "rm -rf repo" &&
+			git init --ref-format=$from_format repo &&
+			test_commit --date "100005000 +0700" --no-tag -C repo initial &&
+			test_commit --date "100003000 +0700" --no-tag -C repo second &&
+			test_migration repo "$to_format"
+		'
 	done
 done
 

-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/8] refs: add reflog support to `git refs migrate`
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
                         ` (7 preceding siblings ...)
  2024-12-16 16:44       ` [PATCH v4 8/8] refs: add support for migrating reflogs Karthik Nayak
@ 2024-12-17  6:59       ` Patrick Steinhardt
  2024-12-17  9:35         ` karthik nayak
  2024-12-19 19:32       ` Toon Claes
  9 siblings, 1 reply; 93+ messages in thread
From: Patrick Steinhardt @ 2024-12-17  6:59 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Christian Couder

On Mon, Dec 16, 2024 at 05:44:25PM +0100, Karthik Nayak wrote:
> Changes in v4:
> - Fix broken tests, due to two reasons in patch 8/8:
>   - The `index` field in `reflog_migration_data` wasn't initialized to
>     0. This specifically doesn't break the test, but causes undefined
>     behavior. Fix this by using designated initializers.
>   - The strbuf within `migration_data` wasn't initialized when the flow
>     exited early, causing memory leaks. Fix this too by using designated
>     initializers.
> - Thanks to Junio for reporting and Patrick for helping shed some light
>   on these broken tests.
> - Link to v3: https://lore.kernel.org/r/20241215-320-git-refs-migrate-reflogs-v3-0-4127fe707b98@gmail.com

The range-diff looks as expected, so this version of the series looks
good to me. Thanks!

Patrick

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/8] refs: add reflog support to `git refs migrate`
  2024-12-17  6:59       ` [PATCH v4 0/8] refs: add reflog support to `git refs migrate` Patrick Steinhardt
@ 2024-12-17  9:35         ` karthik nayak
  2024-12-17 21:28           ` Junio C Hamano
  0 siblings, 1 reply; 93+ messages in thread
From: karthik nayak @ 2024-12-17  9:35 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 911 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Dec 16, 2024 at 05:44:25PM +0100, Karthik Nayak wrote:
>> Changes in v4:
>> - Fix broken tests, due to two reasons in patch 8/8:
>>   - The `index` field in `reflog_migration_data` wasn't initialized to
>>     0. This specifically doesn't break the test, but causes undefined
>>     behavior. Fix this by using designated initializers.
>>   - The strbuf within `migration_data` wasn't initialized when the flow
>>     exited early, causing memory leaks. Fix this too by using designated
>>     initializers.
>> - Thanks to Junio for reporting and Patrick for helping shed some light
>>   on these broken tests.
>> - Link to v3: https://lore.kernel.org/r/20241215-320-git-refs-migrate-reflogs-v3-0-4127fe707b98@gmail.com
>
> The range-diff looks as expected, so this version of the series looks
> good to me. Thanks!
>
> Patrick

Thanks Patrick for your review!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/8] refs: add reflog support to `git refs migrate`
  2024-12-17  9:35         ` karthik nayak
@ 2024-12-17 21:28           ` Junio C Hamano
  0 siblings, 0 replies; 93+ messages in thread
From: Junio C Hamano @ 2024-12-17 21:28 UTC (permalink / raw)
  To: karthik nayak; +Cc: Patrick Steinhardt, git, Christian Couder

karthik nayak <karthik.188@gmail.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> The range-diff looks as expected, so this version of the series looks
>> good to me. Thanks!
>>
>> Patrick
>
> Thanks Patrick for your review!

Thanks, both.  Let's queue it again.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 2/8] refs: add `index` field to `struct ref_udpate`
  2024-12-16 16:44       ` [PATCH v4 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
@ 2024-12-19 19:28         ` Toon Claes
  2024-12-20 10:09           ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Toon Claes @ 2024-12-19 19:28 UTC (permalink / raw)
  To: Karthik Nayak, git; +Cc: Karthik Nayak, ps, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> The reftable backend, sorts its updates by refname before applying them,
> this ensures that the references are stored sorted. When migrating
> reflogs from one backend to another, the order of the reflogs must be
> maintained. Add a new `index` field to the `ref_update` struct to
> facilitate this.
>
> This field is used in the reftable backend's sort comparison function
> `transaction_update_cmp`, to ensure that indexed fields maintain their
> order.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs/refs-internal.h    |  7 +++++++
>  refs/reftable-backend.c | 13 +++++++++++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 0fd95cdacd99e4a728c22f5286f6b3f0f360c110..f5c733d099f0c6f1076a25f4f77d9d5eb345ec87 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -115,6 +115,13 @@ struct ref_update {
>  	char *msg;
>  	char *committer_info;
>  
> +	/*
> +	 * The index overrides the default sort algorithm. This is needed
> +	 * when migrating reflogs and we want to ensure we carry over the
> +	 * same order.
> +	 */
> +	unsigned int index;
> +
>  	/*
>  	 * If this ref_update was split off of a symref update via
>  	 * split_symref_update(), then this member points at that
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index e882602487c66261d586a94101bb1b4e9a2ed60e..c008f20be719fec3af6a8f81c821cb9c263764d7 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1279,8 +1279,17 @@ static int reftable_be_transaction_abort(struct ref_store *ref_store UNUSED,
>  
>  static int transaction_update_cmp(const void *a, const void *b)
>  {
> -	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
> -		      ((struct reftable_transaction_update *)b)->update->refname);
> +	struct reftable_transaction_update *update_a = (struct reftable_transaction_update *)a;
> +	struct reftable_transaction_update *update_b = (struct reftable_transaction_update *)b;
> +
> +	/*
> +	 * If there is an index set, it should take preference (default is 0).
> +	 * This ensures that updates with indexes are sorted amongst themselves.
> +	 */
> +	if (update_a->update->index || update_b->update->index)

What if one of both simply isn't set, and the other one is? Then we
compare an unset with one that is set? Or am I being too paranoid?

--
Toon

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 4/8] refs: extract out refname verification in transactions
  2024-12-16 16:44       ` [PATCH v4 4/8] refs: extract out refname verification in transactions Karthik Nayak
@ 2024-12-19 19:29         ` Toon Claes
  2024-12-20 10:30           ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Toon Claes @ 2024-12-19 19:29 UTC (permalink / raw)
  To: Karthik Nayak, git; +Cc: Karthik Nayak, ps, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
> the refname of the update is verified for:
>
>   - Ensuring it is not a pseudoref.
>   - Checking the refname format.
>
> These checks will also be needed in a following commit where the
> function to add reflog updates to the transaction is introduced. Extract
> the code out into a new static function.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c | 37 +++++++++++++++++++++++--------------
>  1 file changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index f003e51c6bf5229bfbce8ce61ffad7cdba0572e0..9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1196,6 +1196,28 @@ struct ref_update *ref_transaction_add_update(
>  	return update;
>  }
>  
> +static int transaction_refname_valid(const char *refname,
> +				     const struct object_id *new_oid,
> +				     unsigned int flags, struct strbuf *err)
> +{
> +	if (flags & REF_SKIP_REFNAME_VERIFICATION)
> +		return 1;
> +
> +	if (is_pseudo_ref(refname)) {
> +		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
> +			    refname);
> +		return 0;

With this early return you don't need the `else` below? Why did you add
it?

> +	} else if ((new_oid && !is_null_oid(new_oid)) ?
> +		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
> +		 !refname_is_safe(refname)) {
> +		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
> +			    refname);
> +		return 0;
> +	}

I see you've swapped order of checking whether it's a pseudoref with
checking whether the format is okay. I think this shouldn't make a big
difference, but it will give a different error message when attempting
to update an illformatted pseudoref. For me it makes more sense how
you've done it now. But because you mention both checks as bullet points
in the commit message, do you think it would make sense to say something
about them being swapped?

-- 
Toon

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-16 16:44       ` [PATCH v4 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
@ 2024-12-19 19:30         ` Toon Claes
  2024-12-20 10:44           ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Toon Claes @ 2024-12-19 19:30 UTC (permalink / raw)
  To: Karthik Nayak, git; +Cc: Karthik Nayak, ps, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> The `ref_transaction_add_update()` creates the `ref_update` struct. To
> facilitate addition of reflogs in the next commit, the function needs to
> accommodate setting the `committer_info` field in the struct. So modify
> the function to also take `committer_info` as an argument and set it
> accordingly.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c                  |  7 +++++--
>  refs/files-backend.c    | 14 ++++++++------
>  refs/refs-internal.h    |  1 +
>  refs/reftable-backend.c |  6 ++++--
>  4 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 9c9f4940c60d3cdd34ce8f1e668d17b9da3cd801..782bf1090af65196263a3c35ed18d878bb4f2967 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1166,6 +1166,7 @@ struct ref_update *ref_transaction_add_update(
>  		const struct object_id *new_oid,
>  		const struct object_id *old_oid,
>  		const char *new_target, const char *old_target,
> +		const char *committer_info,
>  		const char *msg)
>  {
>  	struct ref_update *update;
> @@ -1190,8 +1191,10 @@ struct ref_update *ref_transaction_add_update(
>  		oidcpy(&update->new_oid, new_oid);
>  	if ((flags & REF_HAVE_OLD) && old_oid)
>  		oidcpy(&update->old_oid, old_oid);
> -	if (!(flags & REF_SKIP_CREATE_REFLOG))
> +	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
> +		update->committer_info = xstrdup_or_null(committer_info);

Why only include the committer_info when we're not skipping reflog
updates?

-- 
Toon


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-13 19:43       ` karthik nayak
@ 2024-12-19 19:31         ` Toon Claes
  2024-12-20 11:31           ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Toon Claes @ 2024-12-19 19:31 UTC (permalink / raw)
  To: karthik nayak, Patrick Steinhardt; +Cc: git, Christian Couder

karthik nayak <karthik.188@gmail.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> On Fri, Dec 13, 2024 at 11:36:50AM +0100, Karthik Nayak wrote:
>>> The `ref_transaction_add_update()` creates the `ref_update` struct. To
>>> facilitate addition of reflogs in the next commit, the function needs to
>>> accommodate setting the `committer_info` field in the struct. So modify
>>> the function to also take `committer_info` as an argument and set it
>>> accordingly.
>>
>> I was wondering a bit whether we could instead pull out a
>> `add_update_internal()` function so that we don't need to modify all
>> callers of `ref_transaction_add_update()`. Because ultimately, we don't
>> use the field anywhere except from `ref_transaction_add_reflog_update()`
>> as far as I can see.
>>
>> This is more of a thought than a strong opinion, so feel free to ignore.
>>
>
> Yes, that is a possible change, but the number of code changes are
> relatively low and I didn't think it made so much difference. Also
> because we'd now have one more function. But I don't mind doing it
> either, if anyone feels strongly about it, I'll happily make that
> change.

Yes, I agree the number of callsites isn't that large, but on the other
hand, I see various calls to this function having four `NULL`s in a row
as arguments. Personally, I think that starts to smell a bit.

Now, before you change anything. I'm not sure what Patrick was
suggesting? Would it mean we basically rename
`ref_transaction_add_update()` to `add_update_internal()` and create a
new wrapper function `ref_transaction_add_update()` that simply calls
`add_update_internal(<ARGS>..., NULL, msg)`? I don't think that's a
great solution either.

Alternively, because ref_transaction_add_update() returns the `struct
ref_update`, why not add a function `ref_update_set_committer` and call
that where we need to set the committer? I see this also will help in a
future commit where you call ref_transaction_add_update() differently
depending on reflog updates.

--
Toon

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/8] refs: add reflog support to `git refs migrate`
  2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
                         ` (8 preceding siblings ...)
  2024-12-17  6:59       ` [PATCH v4 0/8] refs: add reflog support to `git refs migrate` Patrick Steinhardt
@ 2024-12-19 19:32       ` Toon Claes
  2024-12-20 11:23         ` karthik nayak
  9 siblings, 1 reply; 93+ messages in thread
From: Toon Claes @ 2024-12-19 19:32 UTC (permalink / raw)
  To: Karthik Nayak, git; +Cc: Karthik Nayak, ps, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> The `git refs migrate` command was introduced in
> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
> 2024-06-06) to support migrating from one reference backend to another.
>
> One limitation of the feature was that it didn't support migrating
> repositories which contained reflogs. This isn't a requirement on the
> server side as repositories are stored as bare repositories (which do
> not contain any reflogs). Clients however generally use reflogs and
> until now couldn't use the `git refs migrate` command to migrate their
> repositories to the new reftable format.
>
> One of the issues for adding reflog support is that the ref transactions
> don't support reflogs additions:
>   1. While there is REF_LOG_ONLY flag, there is no function to utilize
>   the flag and add reflogs.
>   2. reference backends generally sort the updates by the refname. This
>   wouldn't work for reflogs which need to ensure that they maintain the
>   order of creation.
>   3. In the files backend, reflog entries are added by obtaining locks
>   on the refs themselves. This means each update in the transaction, will
>   obtain a ref_lock. This paradigm fails to accompany the fact that there
>   could be multiple reflog updates for a refname in a single transaction.
>   4. The backends check for duplicate entries, which doesn't make sense
>   in the context of adding multiple reflogs for a given refname.
>
> We overcome these issue we make the following changes:
>   - Update the ref_update structure to also include the committer
>   information. Using this, we can add a new function which only adds
>   reflog updates to the transaction.

Out of interest, I see various changes happen around committer info. But
why is the committer info more relevant for reflog updates, in contrast
to normal ref updates?

-- 
Toon

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-16 16:44       ` [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
@ 2024-12-19 19:32         ` Toon Claes
  2024-12-19 20:25           ` Junio C Hamano
  0 siblings, 1 reply; 93+ messages in thread
From: Toon Claes @ 2024-12-19 19:32 UTC (permalink / raw)
  To: Karthik Nayak, git; +Cc: Karthik Nayak, ps, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> Introduce a new function `ref_transaction_update_reflog`, for clients to
> add a reflog update to a transaction. While the existing function
> `ref_transaction_update` also allows clients to add a reflog entry, this
> function does a few things more, It:
>   - Enforces that only a reflog entry is added and does not update the
>   ref itself.
>   - Allows the users to also provide the committer information. This
>   means clients can add reflog entries with custom committer
>   information.
>
> The `transaction_refname_valid()` function also modifies the error
> message selectively based on the type of the update. This change also
> affects reflog updates which go through `ref_transaction_update()`.
>
> A follow up commit will utilize this function to add reflog support to
> `git refs migrate`.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c               | 39 +++++++++++++++++++++++++++++++++++----
>  refs.h               | 14 ++++++++++++++
>  refs/files-backend.c | 24 ++++++++++++++++--------
>  3 files changed, 65 insertions(+), 12 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 782bf1090af65196263a3c35ed18d878bb4f2967..8b3882cff17e5e3b0376f75654e32f81a23e5cb2 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1207,14 +1207,14 @@ static int transaction_refname_valid(const char *refname,
>  		return 1;
>  
>  	if (is_pseudo_ref(refname)) {
> -		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
> -			    refname);
> +		const char *what = flags & REF_LOG_ONLY ? "reflog for pseudoref" : "pseudoref";

These strings are not localized.

> +		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
>  		return 0;
>  	} else if ((new_oid && !is_null_oid(new_oid)) ?
>  		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
>  		 !refname_is_safe(refname)) {
> -		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
> -			    refname);
> +		const char *what = flags & REF_LOG_ONLY ? "reflog with bad name" : "ref with bad name";

Also these strings are not localized.

-- 
Toon

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 7/8] refs: allow multiple reflog entries for the same refname
  2024-12-16 16:44       ` [PATCH v4 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
@ 2024-12-19 19:33         ` Toon Claes
  2024-12-20 11:15           ` karthik nayak
  0 siblings, 1 reply; 93+ messages in thread
From: Toon Claes @ 2024-12-19 19:33 UTC (permalink / raw)
  To: Karthik Nayak, git; +Cc: Karthik Nayak, ps, Christian Couder

Karthik Nayak <karthik.188@gmail.com> writes:

> The reference transaction only allows a single update for a given
> reference to avoid conflicts. This, however, isn't an issue for reflogs.
> There are no conflicts to be resolved in reflogs and when migrating
> reflogs between backends we'd have multiple reflog entries for the same
> refname.
>
> So allow multiple reflog updates within a single transaction. Also the
> reflog creation logic isn't exposed to the end user. While this might
> change in the future, currently, this reduces the scope of issues to
> think about.
>
> In the reftable backend, the writer sorts all updates based on the
> update_index before writing to the block. When there are multiple
> reflogs for a given refname, it is essential that the order of the
> reflogs is maintained. So add the `index` value to the `update_index`.
> The `index` field is only set when multiple reflog entries for a given
> refname are added and as such in most scenarios the old behavior
> remains.
>
> This is required to add reflog migration support to `git refs migrate`.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs/files-backend.c    | 15 +++++++++++----
>  refs/reftable-backend.c | 22 +++++++++++++++++++---
>  2 files changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>  
>  	update->backend_data = lock;
>  
> +	if (update->flags & REF_LOG_ONLY)
> +		goto out;
> +
>  	if (update->type & REF_ISSYMREF) {
>  		if (update->flags & REF_NO_DEREF) {
>  			/*
> @@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store,
>  	 */
>  	for (i = 0; i < transaction->nr; i++) {
>  		struct ref_update *update = transaction->updates[i];
> -		struct string_list_item *item =
> -			string_list_append(&affected_refnames, update->refname);
> +		struct string_list_item *item;
>  
>  		if ((update->flags & REF_IS_PRUNING) &&
>  		    !(update->flags & REF_NO_DEREF))
>  			BUG("REF_IS_PRUNING set without REF_NO_DEREF");
>  
> +		if (update->flags & REF_LOG_ONLY)
> +			continue;
> +
> +		item = string_list_append(&affected_refnames, update->refname);
>  		/*
>  		 * We store a pointer to update in item->util, but at
>  		 * the moment we never use the value of this field
> @@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  
>  	/* Fail if a refname appears more than once in the transaction: */
>  	for (i = 0; i < transaction->nr; i++)
> -		string_list_append(&affected_refnames,
> -				   transaction->updates[i]->refname);
> +		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
> +			string_list_append(&affected_refnames,
> +					   transaction->updates[i]->refname);
>  	string_list_sort(&affected_refnames);
>  	if (ref_update_reject_duplicates(&affected_refnames, err)) {
>  		ret = TRANSACTION_GENERIC_ERROR;
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  		if (ret)
>  			goto done;
>  
> -		string_list_append(&affected_refnames,
> -				   transaction->updates[i]->refname);
> +		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
> +			string_list_append(&affected_refnames,
> +					   transaction->updates[i]->refname);
>  	}
>  
>  	/*
> @@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>  	struct reftable_log_record *logs = NULL;
>  	struct ident_split committer_ident = {0};
>  	size_t logs_nr = 0, logs_alloc = 0, i;
> +	uint64_t max_update_index = ts;
>  	const char *committer_info;
>  	int ret = 0;
>  
> @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>  				}
>  
>  				fill_reftable_log_record(log, &c);
> -				log->update_index = ts;
> +
> +				/*
> +				 * Updates are sorted by the writer. So updates for the same
> +				 * refname need to contain different update indices.
> +				 */
> +				log->update_index = ts + u->index;

During my review I was having a hard time figuring out when `u->index`
was not 0 and where it is being set. Can you maybe explain a bit?

> +
> +				/*
> +				 * Note the max update_index so the limit can be set later on.
> +				 */
> +				if (log->update_index > max_update_index)

Is there a lot of value in having this if clause? I was a bit confused
why it is here, because I think we can do the assignment to
max_update_index unconditionally.

> +					max_update_index = log->update_index;
> +
>  				log->refname = xstrdup(u->refname);
>  				memcpy(log->value.update.new_hash,
>  				       u->new_oid.hash, GIT_MAX_RAWSZ);
> @@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>  	 * and log blocks.
>  	 */
>  	if (logs) {
> +		reftable_writer_set_limits(writer, ts, max_update_index);

So max_update_index is used to set the limits on the current writer, but
using reftable_stack_next_update_index() it's also used to give the next
stack it's starting point for their range. Now I'm not familiar enough
with the code, but are all stacks handled in sequential order? And how
does a stack relate to a reftable file?

> +
>  		ret = reftable_writer_add_logs(writer, logs, logs_nr);
>  		if (ret < 0)
>  			goto done;
>
> -- 
> 2.47.1

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-19 19:32         ` Toon Claes
@ 2024-12-19 20:25           ` Junio C Hamano
  2024-12-20 10:55             ` karthik nayak
  2024-12-20 12:58             ` [PATCH] refs: mark invalid refname message for translation Karthik Nayak
  0 siblings, 2 replies; 93+ messages in thread
From: Junio C Hamano @ 2024-12-19 20:25 UTC (permalink / raw)
  To: Toon Claes; +Cc: Karthik Nayak, git, ps, Christian Couder

Toon Claes <toon@iotcl.com> writes:

>> +		const char *what = flags & REF_LOG_ONLY ? "reflog for pseudoref" : "pseudoref";
>
> These strings are not localized.
>
>> +		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
>>  		return 0;

And the structure forces sentence logo.  If "reflog for pseudoref"
were masculine and "pseudoref" were feminine in a language where the
verb "update" conjugates differently based on its object, the
resulting construction cannot be translated.  Rather, we'd need to
do something uglier like this:

	const char *refusal_msg;
	if (flag & REF_LOG_ONLY)
		refusal_msg = _("refusing to update reflog for pseudoref '%s'");
	else
		refusal_msg = _("refusing to update pseudoref '%s'");
	...
	strbuf_addf(err, refusal_msg, refname);


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 2/8] refs: add `index` field to `struct ref_udpate`
  2024-12-19 19:28         ` Toon Claes
@ 2024-12-20 10:09           ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-20 10:09 UTC (permalink / raw)
  To: Toon Claes, git; +Cc: ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1507 bytes --]

Toon Claes <toon@iotcl.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index e882602487c66261d586a94101bb1b4e9a2ed60e..c008f20be719fec3af6a8f81c821cb9c263764d7 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -1279,8 +1279,17 @@ static int reftable_be_transaction_abort(struct ref_store *ref_store UNUSED,
>>
>>  static int transaction_update_cmp(const void *a, const void *b)
>>  {
>> -	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
>> -		      ((struct reftable_transaction_update *)b)->update->refname);
>> +	struct reftable_transaction_update *update_a = (struct reftable_transaction_update *)a;
>> +	struct reftable_transaction_update *update_b = (struct reftable_transaction_update *)b;
>> +
>> +	/*
>> +	 * If there is an index set, it should take preference (default is 0).
>> +	 * This ensures that updates with indexes are sorted amongst themselves.
>> +	 */
>> +	if (update_a->update->index || update_b->update->index)
>
> What if one of both simply isn't set, and the other one is? Then we
> compare an unset with one that is set? Or am I being too paranoid?
>
> --
> Toon

Those are expected scenarios, if one of them contains an index value,
then it'll be sorted before the other. At the end, we need:
1. Values with index to be sorted amongst themselves by index value.
2. Values without index to be sorted amongst themselves by the refname.

Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 4/8] refs: extract out refname verification in transactions
  2024-12-19 19:29         ` Toon Claes
@ 2024-12-20 10:30           ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-20 10:30 UTC (permalink / raw)
  To: Toon Claes, git; +Cc: ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

Toon Claes <toon@iotcl.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>> +static int transaction_refname_valid(const char *refname,
>> +				     const struct object_id *new_oid,
>> +				     unsigned int flags, struct strbuf *err)
>> +{
>> +	if (flags & REF_SKIP_REFNAME_VERIFICATION)
>> +		return 1;
>> +
>> +	if (is_pseudo_ref(refname)) {
>> +		strbuf_addf(err, _("refusing to update pseudoref '%s'"),
>> +			    refname);
>> +		return 0;
>
> With this early return you don't need the `else` below? Why did you add
> it?
>

You mean we could simply have

  if { ... check for pseudoref ... }

  if { ... check for bad refname ... }

  return -1;

then, you're right, that would work too. No specific reason that I added
an else.

>> +	} else if ((new_oid && !is_null_oid(new_oid)) ?
>> +		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
>> +		 !refname_is_safe(refname)) {
>> +		strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
>> +			    refname);
>> +		return 0;
>> +	}
>
> I see you've swapped order of checking whether it's a pseudoref with
> checking whether the format is okay. I think this shouldn't make a big
> difference, but it will give a different error message when attempting
> to update an illformatted pseudoref. For me it makes more sense how
> you've done it now. But because you mention both checks as bullet points
> in the commit message, do you think it would make sense to say something
> about them being swapped?
>

I actually didn't notice that I did swap them. It doesn't change the
logic. However, for creation of a pseudoref, in the old flow, we'd check
if the refname is safe and then go to the section where we check for
pseudorefs. Now we simply skip to the pseudoref check. I'll add more
details in the commit message locally for now and will include it if I
do re-roll!

> --
> Toon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-19 19:30         ` Toon Claes
@ 2024-12-20 10:44           ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-20 10:44 UTC (permalink / raw)
  To: Toon Claes, git; +Cc: ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 823 bytes --]

Toon Claes <toon@iotcl.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:

[snip]

>>  {
>>  	struct ref_update *update;
>> @@ -1190,8 +1191,10 @@ struct ref_update *ref_transaction_add_update(
>>  		oidcpy(&update->new_oid, new_oid);
>>  	if ((flags & REF_HAVE_OLD) && old_oid)
>>  		oidcpy(&update->old_oid, old_oid);
>> -	if (!(flags & REF_SKIP_CREATE_REFLOG))
>> +	if (!(flags & REF_SKIP_CREATE_REFLOG)) {
>> +		update->committer_info = xstrdup_or_null(committer_info);
>
> Why only include the committer_info when we're not skipping reflog
> updates?
>
> --
> Toon

The `committer_info` contains information around
1. author of a ref update
2. date/time of the update

This is only relevant in the context of reflogs. Regular ref updates
don't store this information. Hence we only add it for reflogs here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function
  2024-12-19 20:25           ` Junio C Hamano
@ 2024-12-20 10:55             ` karthik nayak
  2024-12-20 12:58             ` [PATCH] refs: mark invalid refname message for translation Karthik Nayak
  1 sibling, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-20 10:55 UTC (permalink / raw)
  To: Junio C Hamano, Toon Claes; +Cc: git, ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Toon Claes <toon@iotcl.com> writes:
>
>>> +		const char *what = flags & REF_LOG_ONLY ? "reflog for pseudoref" : "pseudoref";
>>
>> These strings are not localized.
>>
>>> +		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
>>>  		return 0;
>
> And the structure forces sentence logo.  If "reflog for pseudoref"
> were masculine and "pseudoref" were feminine in a language where the
> verb "update" conjugates differently based on its object, the
> resulting construction cannot be translated.  Rather, we'd need to
> do something uglier like this:
>
> 	const char *refusal_msg;
> 	if (flag & REF_LOG_ONLY)
> 		refusal_msg = _("refusing to update reflog for pseudoref '%s'");
> 	else
> 		refusal_msg = _("refusing to update pseudoref '%s'");
> 	...
> 	strbuf_addf(err, refusal_msg, refname);

Indeed. Will add this in. I think its better to prioritize good
translation here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 7/8] refs: allow multiple reflog entries for the same refname
  2024-12-19 19:33         ` Toon Claes
@ 2024-12-20 11:15           ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-20 11:15 UTC (permalink / raw)
  To: Toon Claes, git; +Cc: ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 7459 bytes --]

Toon Claes <toon@iotcl.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The reference transaction only allows a single update for a given
>> reference to avoid conflicts. This, however, isn't an issue for reflogs.
>> There are no conflicts to be resolved in reflogs and when migrating
>> reflogs between backends we'd have multiple reflog entries for the same
>> refname.
>>
>> So allow multiple reflog updates within a single transaction. Also the
>> reflog creation logic isn't exposed to the end user. While this might
>> change in the future, currently, this reduces the scope of issues to
>> think about.
>>
>> In the reftable backend, the writer sorts all updates based on the
>> update_index before writing to the block. When there are multiple
>> reflogs for a given refname, it is essential that the order of the
>> reflogs is maintained. So add the `index` value to the `update_index`.
>> The `index` field is only set when multiple reflog entries for a given
>> refname are added and as such in most scenarios the old behavior
>> remains.
>>
>> This is required to add reflog migration support to `git refs migrate`.
>>
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
>>  refs/files-backend.c    | 15 +++++++++++----
>>  refs/reftable-backend.c | 22 +++++++++++++++++++---
>>  2 files changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index c11213f52065bcf2fa7612df8f9500692ee2d02c..8953d1c6d37b13b0db701888b3db92fd87a68aaa 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -2611,6 +2611,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>>
>>  	update->backend_data = lock;
>>
>> +	if (update->flags & REF_LOG_ONLY)
>> +		goto out;
>> +
>>  	if (update->type & REF_ISSYMREF) {
>>  		if (update->flags & REF_NO_DEREF) {
>>  			/*
>> @@ -2829,13 +2832,16 @@ static int files_transaction_prepare(struct ref_store *ref_store,
>>  	 */
>>  	for (i = 0; i < transaction->nr; i++) {
>>  		struct ref_update *update = transaction->updates[i];
>> -		struct string_list_item *item =
>> -			string_list_append(&affected_refnames, update->refname);
>> +		struct string_list_item *item;
>>
>>  		if ((update->flags & REF_IS_PRUNING) &&
>>  		    !(update->flags & REF_NO_DEREF))
>>  			BUG("REF_IS_PRUNING set without REF_NO_DEREF");
>>
>> +		if (update->flags & REF_LOG_ONLY)
>> +			continue;
>> +
>> +		item = string_list_append(&affected_refnames, update->refname);
>>  		/*
>>  		 * We store a pointer to update in item->util, but at
>>  		 * the moment we never use the value of this field
>> @@ -3035,8 +3041,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>>
>>  	/* Fail if a refname appears more than once in the transaction: */
>>  	for (i = 0; i < transaction->nr; i++)
>> -		string_list_append(&affected_refnames,
>> -				   transaction->updates[i]->refname);
>> +		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
>> +			string_list_append(&affected_refnames,
>> +					   transaction->updates[i]->refname);
>>  	string_list_sort(&affected_refnames);
>>  	if (ref_update_reject_duplicates(&affected_refnames, err)) {
>>  		ret = TRANSACTION_GENERIC_ERROR;
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..bec5962debea7b62572d08f6fa8fd38ab4cd8af6 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -990,8 +990,9 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>>  		if (ret)
>>  			goto done;
>>
>> -		string_list_append(&affected_refnames,
>> -				   transaction->updates[i]->refname);
>> +		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
>> +			string_list_append(&affected_refnames,
>> +					   transaction->updates[i]->refname);
>>  	}
>>
>>  	/*
>> @@ -1301,6 +1302,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>  	struct reftable_log_record *logs = NULL;
>>  	struct ident_split committer_ident = {0};
>>  	size_t logs_nr = 0, logs_alloc = 0, i;
>> +	uint64_t max_update_index = ts;
>>  	const char *committer_info;
>>  	int ret = 0;
>>
>> @@ -1405,7 +1407,19 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>  				}
>>
>>  				fill_reftable_log_record(log, &c);
>> -				log->update_index = ts;
>> +
>> +				/*
>> +				 * Updates are sorted by the writer. So updates for the same
>> +				 * refname need to contain different update indices.
>> +				 */
>> +				log->update_index = ts + u->index;
>
> During my review I was having a hard time figuring out when `u->index`
> was not 0 and where it is being set. Can you maybe explain a bit?
>

As of this patch, there is no users of the index. This patch adds in the
infrastructure. The next patch is where we actually set the index.

In short, the index is only needed for the reftable backend. This is
because reflogs contain a specific order and we need to retain that
order. In the reftable backend. For optimization, all writes are sorted
by refnames. The index provided a parallel system to retain the order of
the updates. There are no real usecases apart from migration of reflogs
from one backend to another, which is added in the next patch.

>> +
>> +				/*
>> +				 * Note the max update_index so the limit can be set later on.
>> +				 */
>> +				if (log->update_index > max_update_index)
>
> Is there a lot of value in having this if clause? I was a bit confused
> why it is here, because I think we can do the assignment to
> max_update_index unconditionally.
>

It is necessary. For reflogs whose index isn't set, their `update_index`
would simply be the `ts` value. So if there are a mix of reflog updates
with and without index, we could end up with a scenario where we don't
set the max to the actual max.

>> +					max_update_index = log->update_index;
>> +
>>  				log->refname = xstrdup(u->refname);
>>  				memcpy(log->value.update.new_hash,
>>  				       u->new_oid.hash, GIT_MAX_RAWSZ);
>> @@ -1469,6 +1483,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>  	 * and log blocks.
>>  	 */
>>  	if (logs) {
>> +		reftable_writer_set_limits(writer, ts, max_update_index);
>
> So max_update_index is used to set the limits on the current writer, but
> using reftable_stack_next_update_index() it's also used to give the next
> stack it's starting point for their range.

Using `reftable_stack_next_update_index()` would return `ts + 1` as that
is the next sequential update. This could be lesser than the
max_update_index. So we can't use that. Once all the reflogs are
written, the next call to `reftable_stack_next_update_index()` would
return `max_update_index + 1`.

> Now I'm not familiar enough with the code, but are all stacks handled
> in sequential order?

Not sure I understand your question correctly. Updates are handled as
per a given index. Each update is also sequentially stored. Tables are
named after the min and max index that they store.

> And how does a stack relate to a reftable file?

The stack is used to refer to a collection of reftable tables. So for a
given worktree, the tables under '$GIT_DIR/reftable' would constitute a
stack, where the 'tables.list' would state the tables which are part of
the stack

>> +
>>  		ret = reftable_writer_add_logs(writer, logs, logs_nr);
>>  		if (ret < 0)
>>  			goto done;
>>
>> --
>> 2.47.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/8] refs: add reflog support to `git refs migrate`
  2024-12-19 19:32       ` Toon Claes
@ 2024-12-20 11:23         ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-20 11:23 UTC (permalink / raw)
  To: Toon Claes, git; +Cc: ps, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2889 bytes --]

Toon Claes <toon@iotcl.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The `git refs migrate` command was introduced in
>> 25a0023f28 (builtin/refs: new command to migrate ref storage formats,
>> 2024-06-06) to support migrating from one reference backend to another.
>>
>> One limitation of the feature was that it didn't support migrating
>> repositories which contained reflogs. This isn't a requirement on the
>> server side as repositories are stored as bare repositories (which do
>> not contain any reflogs). Clients however generally use reflogs and
>> until now couldn't use the `git refs migrate` command to migrate their
>> repositories to the new reftable format.
>>
>> One of the issues for adding reflog support is that the ref transactions
>> don't support reflogs additions:
>>   1. While there is REF_LOG_ONLY flag, there is no function to utilize
>>   the flag and add reflogs.
>>   2. reference backends generally sort the updates by the refname. This
>>   wouldn't work for reflogs which need to ensure that they maintain the
>>   order of creation.
>>   3. In the files backend, reflog entries are added by obtaining locks
>>   on the refs themselves. This means each update in the transaction, will
>>   obtain a ref_lock. This paradigm fails to accompany the fact that there
>>   could be multiple reflog updates for a refname in a single transaction.
>>   4. The backends check for duplicate entries, which doesn't make sense
>>   in the context of adding multiple reflogs for a given refname.
>>
>> We overcome these issue we make the following changes:
>>   - Update the ref_update structure to also include the committer
>>   information. Using this, we can add a new function which only adds
>>   reflog updates to the transaction.
>
> Out of interest, I see various changes happen around committer info. But
> why is the committer info more relevant for reflog updates, in contrast
> to normal ref updates?
>
> --
> Toon

The committer info is metadata around a ref update. It is only stored in
the reflogs. So they're only relevant for reflogs! For e.g. in the files
backend, we can see:

  $ cat .git/logs/HEAD | tail -1
  7a929cb27ca6df69d4db64b008e27e002c691028
bc8e00178b671da2b845ccbba175f6c093ed6949 Karthik Nayak
<karthik.188@google.com> 1734624674 +0100	commit: c10

Here the 'Karthik Nayak <karthik.188@gmail.com> 1734624673 +0100'
section is the committer_info. Whereas if you see the reference itself

  $ cat .git/HEAD
  ref: refs/heads/master

It only contains reference information. So, the `committer_info` is
not needed for regular ref updates. Earlier we dynamically obtain the
information when a reflog was being created. But for migration of
existing reflogs we need to pass this information from the source to the
ref transaction mechanism. So we pass the `committer_info` through the
layers.

Thank you for the review.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()`
  2024-12-19 19:31         ` Toon Claes
@ 2024-12-20 11:31           ` karthik nayak
  0 siblings, 0 replies; 93+ messages in thread
From: karthik nayak @ 2024-12-20 11:31 UTC (permalink / raw)
  To: Toon Claes, Patrick Steinhardt; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2654 bytes --]

Toon Claes <toon@iotcl.com> writes:

> karthik nayak <karthik.188@gmail.com> writes:
>
>> Patrick Steinhardt <ps@pks.im> writes:
>>
>>> On Fri, Dec 13, 2024 at 11:36:50AM +0100, Karthik Nayak wrote:
>>>> The `ref_transaction_add_update()` creates the `ref_update` struct. To
>>>> facilitate addition of reflogs in the next commit, the function needs to
>>>> accommodate setting the `committer_info` field in the struct. So modify
>>>> the function to also take `committer_info` as an argument and set it
>>>> accordingly.
>>>
>>> I was wondering a bit whether we could instead pull out a
>>> `add_update_internal()` function so that we don't need to modify all
>>> callers of `ref_transaction_add_update()`. Because ultimately, we don't
>>> use the field anywhere except from `ref_transaction_add_reflog_update()`
>>> as far as I can see.
>>>
>>> This is more of a thought than a strong opinion, so feel free to ignore.
>>>
>>
>> Yes, that is a possible change, but the number of code changes are
>> relatively low and I didn't think it made so much difference. Also
>> because we'd now have one more function. But I don't mind doing it
>> either, if anyone feels strongly about it, I'll happily make that
>> change.
>
> Yes, I agree the number of callsites isn't that large, but on the other
> hand, I see various calls to this function having four `NULL`s in a row
> as arguments. Personally, I think that starts to smell a bit.
>

I agree with your reasoning here..

> Now, before you change anything. I'm not sure what Patrick was
> suggesting? Would it mean we basically rename
> `ref_transaction_add_update()` to `add_update_internal()` and create a
> new wrapper function `ref_transaction_add_update()` that simply calls
> `add_update_internal(<ARGS>..., NULL, msg)`? I don't think that's a
> great solution either.
>

Yes, agreed with this too.

> Alternively, because ref_transaction_add_update() returns the `struct
> ref_update`, why not add a function `ref_update_set_committer` and call
> that where we need to set the committer? I see this also will help in a
> future commit where you call ref_transaction_add_update() differently
> depending on reflog updates.
>

I think this seems like a nice way to go about it. Currently all the
logic pertaining to creating an `ref_update` struct is contained within
`ref_transaction_add_update()`. So having individual functions would
make sense, but the con here is that this doesn't enforce fields to be
set. But for `committer_info` it does make sense. I'm going to leave it
for now since the series is merged to `next`. Maybe something to do in
the future as part of #leftoverbits.

> --
> Toon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH] refs: mark invalid refname message for translation
  2024-12-19 20:25           ` Junio C Hamano
  2024-12-20 10:55             ` karthik nayak
@ 2024-12-20 12:58             ` Karthik Nayak
  2024-12-20 15:53               ` Junio C Hamano
  1 sibling, 1 reply; 93+ messages in thread
From: Karthik Nayak @ 2024-12-20 12:58 UTC (permalink / raw)
  To: gitster; +Cc: git, karthik.188, toon

The error message produced by `transaction_refname_valid()` changes based
on whether the update is a ref update or a reflog update, with the use
of a ternary operator. This breaks translation since the sub-msg is not
marked for translation. Fix this by setting the entire message using a
`if {} else {}` block and marking each message for translation.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Since the reflog migration topic has been merged to 'next', I am sending this 
as an individual patch which applies on top of 'kn/reflog-migration'. 

Junio, I'd also be happy to re-roll the series if that is better.

 refs.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 58e543ce39..9c887427f2 100644
--- a/refs.c
+++ b/refs.c
@@ -1255,14 +1255,22 @@ static int transaction_refname_valid(const char *refname,
 		return 1;
 
 	if (is_pseudo_ref(refname)) {
-		const char *what = flags & REF_LOG_ONLY ? "reflog for pseudoref" : "pseudoref";
-		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
+		const char *refusal_msg;
+		if (flags & REF_LOG_ONLY)
+			refusal_msg = _("refusing to update reflog for pseudoref '%s'");
+		else
+			refusal_msg = _("refusing to update pseudoref '%s'");
+		strbuf_addf(err, refusal_msg, refname);
 		return 0;
 	} else if ((new_oid && !is_null_oid(new_oid)) ?
 		 check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
 		 !refname_is_safe(refname)) {
-		const char *what = flags & REF_LOG_ONLY ? "reflog with bad name" : "ref with bad name";
-		strbuf_addf(err, _("refusing to update %s '%s'"), what, refname);
+		const char *refusal_msg;
+		if (flags & REF_LOG_ONLY)
+			refusal_msg = _("refusing to update reflog with bad name '%s'");
+		else
+			refusal_msg = _("refusing to update ref with bad name '%s'");
+		strbuf_addf(err, refusal_msg, refname);
 		return 0;
 	}
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH] refs: mark invalid refname message for translation
  2024-12-20 12:58             ` [PATCH] refs: mark invalid refname message for translation Karthik Nayak
@ 2024-12-20 15:53               ` Junio C Hamano
  2024-12-24 10:34                 ` Toon Claes
  0 siblings, 1 reply; 93+ messages in thread
From: Junio C Hamano @ 2024-12-20 15:53 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon

Karthik Nayak <karthik.188@gmail.com> writes:

> The error message produced by `transaction_refname_valid()` changes based
> on whether the update is a ref update or a reflog update, with the use
> of a ternary operator. This breaks translation since the sub-msg is not
> marked for translation. Fix this by setting the entire message using a
> `if {} else {}` block and marking each message for translation.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>
> Since the reflog migration topic has been merged to 'next', I am sending this 
> as an individual patch which applies on top of 'kn/reflog-migration'. 

Thanks, that is the most sensible way to fix up a glitch that was
discovered too late ;-)  Will queue.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] refs: mark invalid refname message for translation
  2024-12-20 15:53               ` Junio C Hamano
@ 2024-12-24 10:34                 ` Toon Claes
  0 siblings, 0 replies; 93+ messages in thread
From: Toon Claes @ 2024-12-24 10:34 UTC (permalink / raw)
  To: Junio C Hamano, Karthik Nayak; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Since the reflog migration topic has been merged to 'next', I am sending this 
>> as an individual patch which applies on top of 'kn/reflog-migration'. 
>
> Thanks, that is the most sensible way to fix up a glitch that was
> discovered too late ;-)  Will queue.

Thanks both for dealing with this so swiftly! Error messag is way better
like this.

-- 
Toon

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2024-12-24 10:34 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 11:07 [PATCH 0/7] refs: add reflog support to `git refs migrate` Karthik Nayak
2024-12-09 11:07 ` [PATCH 1/7] refs: include committer info in `ref_update` struct Karthik Nayak
2024-12-10 16:51   ` Christian Couder
2024-12-11 10:13     ` karthik nayak
2024-12-09 11:07 ` [PATCH 2/7] refs: add `index` field to `struct ref_udpate` Karthik Nayak
2024-12-09 11:07 ` [PATCH 3/7] refs/files: add count field to ref_lock Karthik Nayak
2024-12-10 17:22   ` Christian Couder
2024-12-11 10:18     ` karthik nayak
2024-12-11  9:05   ` Christian Couder
2024-12-11 10:26     ` karthik nayak
2024-12-09 11:07 ` [PATCH 4/7] refs: extract out refname verification in transactions Karthik Nayak
2024-12-11  9:26   ` Christian Couder
2024-12-11 10:31     ` karthik nayak
2024-12-11 14:26     ` Patrick Steinhardt
2024-12-09 11:07 ` [PATCH 5/7] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
2024-12-11 10:10   ` Christian Couder
2024-12-11 18:06     ` karthik nayak
2024-12-11 14:26   ` Patrick Steinhardt
2024-12-11 18:09     ` karthik nayak
2024-12-09 11:07 ` [PATCH 6/7] refs: allow multiple reflog entries for the same refname Karthik Nayak
2024-12-11 10:44   ` Christian Couder
2024-12-12 14:52     ` karthik nayak
2024-12-11 14:26   ` Patrick Steinhardt
2024-12-12 14:47     ` karthik nayak
2024-12-09 11:07 ` [PATCH 7/7] refs: add support for migrating reflogs Karthik Nayak
2024-12-11 14:26   ` Patrick Steinhardt
2024-12-12 14:04     ` karthik nayak
2024-12-10 12:13 ` [PATCH 0/7] refs: add reflog support to `git refs migrate` Junio C Hamano
2024-12-10 17:42   ` karthik nayak
2024-12-10 18:03     ` karthik nayak
2024-12-13 10:36 ` [PATCH v2 0/8] " Karthik Nayak
2024-12-13 10:36   ` [PATCH v2 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
2024-12-13 10:36   ` [PATCH v2 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
2024-12-13 10:36   ` [PATCH v2 3/8] refs/files: add count field to ref_lock Karthik Nayak
2024-12-13 10:36   ` [PATCH v2 4/8] refs: extract out refname verification in transactions Karthik Nayak
2024-12-13 10:36   ` [PATCH v2 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
2024-12-13 12:24     ` Patrick Steinhardt
2024-12-13 19:43       ` karthik nayak
2024-12-19 19:31         ` Toon Claes
2024-12-20 11:31           ` karthik nayak
2024-12-13 10:36   ` [PATCH v2 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
2024-12-13 11:44     ` Christian Couder
2024-12-13 19:49       ` karthik nayak
2024-12-13 12:24     ` Patrick Steinhardt
2024-12-13 10:36   ` [PATCH v2 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
2024-12-13 12:24     ` Patrick Steinhardt
2024-12-13 20:02       ` karthik nayak
2024-12-13 10:36   ` [PATCH v2 8/8] refs: add support for migrating reflogs Karthik Nayak
2024-12-13 12:24     ` Patrick Steinhardt
2024-12-15 11:09       ` karthik nayak
2024-12-15 16:25   ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 3/8] refs/files: add count field to ref_lock Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 4/8] refs: extract out refname verification in transactions Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
2024-12-15 16:25     ` [PATCH v3 8/8] refs: add support for migrating reflogs Karthik Nayak
2024-12-16  7:25       ` Patrick Steinhardt
2024-12-16 15:50         ` Junio C Hamano
2024-12-16 15:59           ` karthik nayak
2024-12-15 23:54     ` [PATCH v3 0/8] refs: add reflog support to `git refs migrate` Junio C Hamano
2024-12-16 14:33       ` karthik nayak
2024-12-16 16:32         ` Junio C Hamano
2024-12-16 16:44     ` [PATCH v4 " Karthik Nayak
2024-12-16 16:44       ` [PATCH v4 1/8] refs: include committer info in `ref_update` struct Karthik Nayak
2024-12-16 16:44       ` [PATCH v4 2/8] refs: add `index` field to `struct ref_udpate` Karthik Nayak
2024-12-19 19:28         ` Toon Claes
2024-12-20 10:09           ` karthik nayak
2024-12-16 16:44       ` [PATCH v4 3/8] refs/files: add count field to ref_lock Karthik Nayak
2024-12-16 16:44       ` [PATCH v4 4/8] refs: extract out refname verification in transactions Karthik Nayak
2024-12-19 19:29         ` Toon Claes
2024-12-20 10:30           ` karthik nayak
2024-12-16 16:44       ` [PATCH v4 5/8] refs: add `committer_info` to `ref_transaction_add_update()` Karthik Nayak
2024-12-19 19:30         ` Toon Claes
2024-12-20 10:44           ` karthik nayak
2024-12-16 16:44       ` [PATCH v4 6/8] refs: introduce the `ref_transaction_update_reflog` function Karthik Nayak
2024-12-19 19:32         ` Toon Claes
2024-12-19 20:25           ` Junio C Hamano
2024-12-20 10:55             ` karthik nayak
2024-12-20 12:58             ` [PATCH] refs: mark invalid refname message for translation Karthik Nayak
2024-12-20 15:53               ` Junio C Hamano
2024-12-24 10:34                 ` Toon Claes
2024-12-16 16:44       ` [PATCH v4 7/8] refs: allow multiple reflog entries for the same refname Karthik Nayak
2024-12-19 19:33         ` Toon Claes
2024-12-20 11:15           ` karthik nayak
2024-12-16 16:44       ` [PATCH v4 8/8] refs: add support for migrating reflogs Karthik Nayak
2024-12-17  6:59       ` [PATCH v4 0/8] refs: add reflog support to `git refs migrate` Patrick Steinhardt
2024-12-17  9:35         ` karthik nayak
2024-12-17 21:28           ` Junio C Hamano
2024-12-19 19:32       ` Toon Claes
2024-12-20 11:23         ` karthik nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).