[PATCH v2 0/2] refs: introduce reftable backend

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Han-Wen Nienhuys <hanwen@google.com>,
	Karthik Nayak <karthik.188@gmail.com>
Subject: [PATCH v2 0/2] refs: introduce reftable backend
Date: Fri, 2 Feb 2024 09:38:25 +0100	[thread overview]
Message-ID: <cover.1706862705.git.ps@pks.im> (raw)
In-Reply-To: <cover.1706601199.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 18602 bytes --]

Hi,

this is the second version of my patch series that introduces the
reftable backend.

This version addresses the feedback by Karthik. I don't think it really
makes sense to spell out every change here -- the range diff should
likely be easier to digest.

Thanks!

Patrick

Patrick Steinhardt (2):
  refs: introduce reftable backend
  ci: add jobs to test with the reftable backend

 .github/workflows/main.yml                    |    9 +
 .gitlab-ci.yml                                |    9 +
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 ci/lib.sh                                     |    2 +-
 ci/run-build-and-tests.sh                     |    3 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2295 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 17 files changed, 3246 insertions(+), 7 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

Range-diff against v1:
1:  5598cd1307 ! 1:  d6548dcfad refs: introduce reftable backend
    @@ Commit message
             previously been deleting reflogs together with their refs to avoid
             file/directory conflicts, which is not necessary anymore.
     
    +      - We can properly enumerate all refs. With the "files" backend it is
    +        not easily possible to distinguish between refs and non-refs because
    +        they may live side by side in the gitdir.
    +
         Not all of these improvements are realized with the current "reftable"
         backend implementation. At this point, the new backend is supposed to be
         a drop-in replacement for the "files" backend that is used by basically
    @@ Commit message
             workloads we would likely also want to perform pack loose refs,
             which would likely change the picture.
     
    -            Benchmark 1: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
                   Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
                   Range (min … max):     1.8 ms …   4.3 ms    133 runs
     
    -            Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
                   Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
                   Range (min … max):     2.4 ms …   2.9 ms    132 runs
     
    -            Benchmark 3: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
                   Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
                   Range (min … max):    1.969 s …  1.980 s    3 runs
     
    -            Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
                   Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
                   Range (min … max):    2.597 s …  2.622 s    3 runs
     
    -            Benchmark 5: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
                   Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
                   Range (min … max):   198.189 s … 198.670 s    3 runs
     
    -            Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
                   Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
                   Range (min … max):   290.223 s … 298.761 s    3 runs
     
    @@ refs/reftable-backend.c (new)
     +#include "../git-compat-util.h"
     +#include "../abspath.h"
     +#include "../chdir-notify.h"
    -+#include "../config.h"
     +#include "../environment.h"
     +#include "../gettext.h"
     +#include "../hash.h"
    @@ refs/reftable-backend.c (new)
     +#include "../reftable/reftable-stack.h"
     +#include "../reftable/reftable-record.h"
     +#include "../reftable/reftable-error.h"
    -+#include "../reftable/reftable-blocksource.h"
    -+#include "../reftable/reftable-reader.h"
     +#include "../reftable/reftable-iterator.h"
     +#include "../reftable/reftable-merged.h"
    -+#include "../reftable/reftable-generic.h"
     +#include "../setup.h"
     +#include "../strmap.h"
    -+#include "../worktree.h"
     +#include "refs-internal.h"
     +
     +/*
    @@ refs/reftable-backend.c (new)
     +struct reftable_ref_store {
     +	struct ref_store base;
     +
    ++	/*
    ++	 * The main stack refers to the common dir and thus contains common
    ++	 * refs as well as refs of the main repository.
    ++	 */
     +	struct reftable_stack *main_stack;
    ++	/*
    ++	 * The worktree stack refers to the gitdir in case the refdb is opened
    ++	 * via a worktree. It thus contains the per-worktree refs.
    ++	 */
     +	struct reftable_stack *worktree_stack;
    ++	/*
    ++	 * Map of worktree stacks by their respective worktree names. The map
    ++	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
    ++	 */
     +	struct strmap worktree_stacks;
     +	struct reftable_write_options write_options;
     +
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
     +	 * This stack contains both the shared and the main worktree refs.
    ++	 *
    ++	 * Note that we don't try to resolve the path in case we have a
    ++	 * worktree because `get_common_dir_noenv()` already does it for us.
     +	 */
     +	is_worktree = get_common_dir_noenv(&path, gitdir);
     +	if (!is_worktree) {
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * If we're in a worktree we also need to set up the worktree reftable
     +	 * stack that is contained in the per-worktree GIT_DIR.
    ++	 *
    ++	 * Ideally, we would also add the stack to our worktree stack map. But
    ++	 * we have no way to figure out the worktree name here and thus can't
    ++	 * do it efficiently.
     +	 */
     +	if (is_worktree) {
     +		strbuf_reset(&path);
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		 /*
    -+		  * Otherwise, if we either have no worktree refs anymore or if
    -+		  * the common ref sorts before the next worktree ref, we need
    -+		  * to figure out whether the next common ref belongs to the
    -+		  * main worktree. In that case, it should be ignored.
    ++		  * We now know that the lexicographically-next ref is a common
    ++		  * ref. When the common ref is a shared one we return it.
     +		  */
     +		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
     +				       NULL) == REF_WORKTREE_SHARED)
     +			return ITER_SELECT_1;
     +
    ++		/*
    ++		 * Otherwise, if the common ref is a per-worktree ref we skip
    ++		 * it because it would belong to the main worktree, not ours.
    ++		 */
     +		return ITER_SKIP_1;
     +	} else {
     +		return ITER_DONE;
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		if (u->type & REF_ISSYMREF) {
    ++			/*
    ++			 * The reftable stack is locked at this point already,
    ++			 * so it is safe to call `refs_resolve_ref_unsafe()`
    ++			 * here without causing races.
    ++			 */
     +			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
     +								       &current_oid, NULL);
     +
     +			if (u->flags & REF_NO_DEREF) {
    -+				/*
    -+				 * The reftable stack is locked at this point
    -+				 * already, so it should be safe to call
    -+				 * `refs_resolve_ref_unsafe()` here.
    -+				 */
     +				if (u->flags & REF_HAVE_OLD && !resolved) {
     +					strbuf_addf(err, _("cannot lock ref '%s': "
     +						    "error reading reference"), u->refname);
    @@ refs/reftable-backend.c (new)
     +				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
     +				tombstone = &logs[logs_nr++];
     +				tombstone->refname = xstrdup(u->refname);
    -+				tombstone->value_type = REFTABLE_LOG_DELETION,
    ++				tombstone->value_type = REFTABLE_LOG_DELETION;
     +				tombstone->update_index = log.update_index;
     +			}
     +
    @@ refs/reftable-backend.c (new)
     +
     +	/*
     +	 * When deleting the old reference we have to use two update indices:
    -+	 * one to delete the old ref and its reflog, and once to create the new
    -+	 * ref and its reflog. They need to be staged with two separate indices
    -+	 * because the new reflog needs to encode both the deletion of the old
    -+	 * branch and the creation of the new branch, and we cannot do two
    -+	 * changes to a reflog in a single update.
    ++	 * once to delete the old ref and its reflog, and once to create the
    ++	 * new ref and its reflog. They need to be staged with two separate
    ++	 * indices because the new reflog needs to encode both the deletion of
    ++	 * the old branch and the creation of the new branch, and we cannot do
    ++	 * two changes to a reflog in a single update.
     +	 */
     +	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
     +	if (arg->delete_old)
    @@ refs/reftable-backend.c (new)
     +	struct reftable_reflog_iterator *iter =
     +		(struct reftable_reflog_iterator *)ref_iterator;
     +
    -+	if (iter->err) {
    -+		ref_iterator_abort(ref_iterator);
    -+		return iter->err;
    -+	}
    -+
    -+	while (1) {
    -+		int flags, ret;
    ++	while (!iter->err) {
    ++		int flags;
     +
    -+		ret = reftable_iterator_next_log(&iter->iter, &iter->log);
    -+		if (ret < 0) {
    -+			ref_iterator_abort(ref_iterator);
    -+			return ITER_ERROR;
    -+		}
    -+		if (ret > 0) {
    -+			if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    -+				return ITER_ERROR;
    -+			return ITER_DONE;
    -+		}
    ++		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
    ++		if (iter->err)
    ++			break;
     +
     +		/*
     +		 * We want the refnames that we have reflogs for, so we skip if
    @@ refs/reftable-backend.c (new)
     +		iter->base.refname = iter->log.refname;
     +		iter->base.oid = &iter->oid;
     +		iter->base.flags = flags;
    -+		return ITER_OK;
    ++
    ++		break;
     +	}
    ++
    ++	if (iter->err > 0) {
    ++		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    ++			return ITER_ERROR;
    ++		return ITER_DONE;
    ++	}
    ++
    ++	if (iter->err < 0) {
    ++		ref_iterator_abort(ref_iterator);
    ++		return ITER_ERROR;
    ++	}
    ++
    ++	return ITER_OK;
     +}
     +
     +static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/reftable-backend.c (new)
     +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
     +								  struct reftable_stack *stack)
     +{
    ++	struct reftable_merged_table *merged_table;
     +	struct reftable_reflog_iterator *iter;
    -+	struct reftable_merged_table *mt;
     +	int ret;
     +
     +	iter = xcalloc(1, sizeof(*iter));
    @@ refs/reftable-backend.c (new)
     +	iter->refs = refs;
     +	iter->base.oid = &iter->oid;
     +
    ++	ret = refs->err;
    ++	if (ret)
    ++		goto done;
    ++
     +	ret = reftable_stack_reload(refs->main_stack);
     +	if (ret < 0)
     +		goto done;
     +
    -+	mt = reftable_stack_merged_table(stack);
    -+	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
    ++	merged_table = reftable_stack_merged_table(stack);
    ++
    ++	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
     +	if (ret < 0)
     +		goto done;
     +
    @@ refs/reftable-backend.c (new)
     +					iterator_select, NULL);
     +}
     +
    ++static int yield_log_record(struct reftable_log_record *log,
    ++			    each_reflog_ent_fn fn,
    ++			    void *cb_data)
    ++{
    ++	struct object_id old_oid, new_oid;
    ++	const char *full_committer;
    ++
    ++	oidread(&old_oid, log->value.update.old_hash);
    ++	oidread(&new_oid, log->value.update.new_hash);
    ++
    ++	/*
    ++	 * When both the old object ID and the new object ID are null
    ++	 * then this is the reflog existence marker. The caller must
    ++	 * not be aware of it.
    ++	 */
    ++	if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    ++		return 0;
    ++
    ++	full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    ++				   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    ++	return fn(&old_oid, &new_oid, full_committer,
    ++		  log->value.update.time, log->value.update.tz_offset,
    ++		  log->value.update.message, cb_data);
    ++}
    ++
     +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
     +						   const char *refname,
     +						   each_reflog_ent_fn fn,
    @@ refs/reftable-backend.c (new)
     +	mt = reftable_stack_merged_table(stack);
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
     +	while (!ret) {
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer;
    -+
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
     +			break;
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		oidread(&old_oid, log.value.update.old_hash);
    -+		oidread(&new_oid, log.value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log.value.update.name, log.value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log.value.update.time, log.value.update.tz_offset,
    -+			 log.value.update.message, cb_data);
    ++		ret = yield_log_record(&log, fn, cb_data);
     +		if (ret)
     +			break;
     +	}
    @@ refs/reftable-backend.c (new)
     +
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
    -+			break;
    ++			goto done;
     +		if (ret > 0 || strcmp(log.refname, refname)) {
     +			reftable_log_record_release(&log);
     +			ret = 0;
    @@ refs/reftable-backend.c (new)
     +	}
     +
     +	for (i = logs_nr; i--;) {
    -+		struct reftable_log_record *log = &logs[i];
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer = "";
    -+
    -+		oidread(&old_oid, log->value.update.old_hash);
    -+		oidread(&new_oid, log->value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log->value.update.time, log->value.update.tz_offset,
    -+			 log->value.update.message, cb_data);
    ++		ret = yield_log_record(&logs[i], fn, cb_data);
     +		if (ret)
    -+			break;
    ++			goto done;
     +	}
     +
    ++done:
     +	reftable_iterator_destroy(&it);
     +	for (i = 0; i < logs_nr; i++)
     +		reftable_log_record_release(&logs[i]);
    @@ refs/reftable-backend.c (new)
     +		goto done;
     +
     +	ret = reftable_stack_reload(stack);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	/*
    -+	 * Seek the reflog to see whether it contains any reflog entries which
    -+	 * aren't marked for deletion.
    ++	 * Check whether we get at least one log record for the given ref name.
    ++	 * If so, the reflog exists, otherwise it doesn't.
     +	 */
    -+	while (1) {
    -+		ret = reftable_iterator_next_log(&it, &log);
    -+		if (ret < 0)
    -+			goto done;
    -+		if (ret > 0 || strcmp(log.refname, refname)) {
    -+			ret = 0;
    -+			goto done;
    -+		}
    -+
    -+		ret = 1;
    -+		break;
    ++	ret = reftable_iterator_next_log(&it, &log);
    ++	if (ret < 0)
    ++		goto done;
    ++	if (ret > 0) {
    ++		ret = 0;
    ++		goto done;
     +	}
     +
    ++	ret = strcmp(log.refname, refname) == 0;
    ++
     +done:
     +	reftable_iterator_destroy(&it);
     +	reftable_log_record_release(&log);
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		tombstone.refname = (char *)arg->refname,
    -+		tombstone.value_type = REFTABLE_LOG_DELETION,
    ++		tombstone.refname = (char *)arg->refname;
    ++		tombstone.value_type = REFTABLE_LOG_DELETION;
     +		tombstone.update_index = log.update_index;
     +
     +		ret = reftable_writer_add_log(writer, &tombstone);
2:  bb694fa132 = 2:  63eafc9f5b ci: add jobs to test with the reftable backend
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2024-02-02  8:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
2024-01-30  8:05 ` [PATCH 1/2] " Patrick Steinhardt
2024-02-01 15:17   ` Karthik Nayak
2024-02-02  8:30     ` Patrick Steinhardt
2024-02-02 10:52       ` Patrick Steinhardt
2024-01-30  8:05 ` [PATCH 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-01-30 22:08 ` [PATCH 0/2] refs: introduce " Junio C Hamano
2024-02-02  8:38 ` Patrick Steinhardt [this message]
2024-02-02  8:38   ` [PATCH v2 1/2] " Patrick Steinhardt
2024-02-02  8:38   ` [PATCH v2 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-02 13:02   ` [PATCH v2 0/2] refs: introduce " Karthik Nayak
2024-02-03 20:41   ` Junio C Hamano
2024-02-04  6:00     ` Patrick Steinhardt
2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
2024-02-05  6:02   ` [PATCH v3 1/2] " Patrick Steinhardt
2024-02-05  6:02   ` [PATCH v3 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-05 13:10   ` [PATCH v3 0/2] refs: introduce " Karthik Nayak
2024-02-07  7:20 ` [PATCH v4 " Patrick Steinhardt
2024-02-07  7:20   ` [PATCH v4 1/2] " Patrick Steinhardt
2024-02-07 22:31     ` Jeff King
2024-02-08  5:11       ` Patrick Steinhardt
2024-02-15  5:39         ` Jeff King
2024-02-07  7:20   ` [PATCH v4 2/2] ci: add jobs to test with the " Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1706862705.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanwen@google.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.