All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aaron Paterson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Aaron Paterson <apaterson@pm.me>, Aaron Paterson <apaterson@pm.me>
Subject: [PATCH v2] odb: add write_packfile, for_each_unique_abbrev, convert_object_id
Date: Thu, 26 Mar 2026 13:39:43 +0000	[thread overview]
Message-ID: <pull.2074.v2.git.1774532383055.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2074.git.1774530437562.gitgitgadget@gmail.com>

From: Aaron Paterson <apaterson@pm.me>

Add three vtable methods to odb_source that were not part of the
recent ps/odb-sources and ps/object-counting series:

 - write_packfile: ingest a pack from a file descriptor. The files
   backend chooses between index-pack (large packs) and
   unpack-objects (small packs below fetch.unpackLimit). Options
   cover thin-pack fixing, promisor marking, fsck, lockfile
   capture, and shallow file passing.

 - for_each_unique_abbrev: iterate objects matching a hex prefix
   for disambiguation. Searches loose objects via oidtree, then
   multi-pack indices, then non-MIDX packs.

 - convert_object_id: translate between hash algorithms using the
   loose object map. Used during SHA-1 to SHA-256 migration.

Also add ODB_SOURCE_HELPER to the source type enum, preparing for
the helper backend in the next commit.

The write_packfile vtable method replaces the pattern where callers
spawn index-pack/unpack-objects directly. fast-import already uses
odb_write_packfile() and this allows non-files backends to handle
pack ingestion through their own mechanism.

Signed-off-by: Aaron Paterson <apaterson@pm.me>
---
    odb: add write_packfile, for_each_unique_abbrev, convert_object_id
    
    This adds three ODB source vtable methods that were not part of the
    recent ps/odb-sources and ps/object-counting series, plus caller routing
    for object-name.c and fast-import.c.
    
    New vtable methods:
    
     * write_packfile: Ingest a pack from a file descriptor. The files
       backend chooses between index-pack (large packs) and unpack-objects
       (small packs below fetch.unpackLimit). Options cover thin-pack
       fixing, promisor marking, fsck, lockfile capture, and shallow file
       passing. Non-files backends can handle pack ingestion through their
       own mechanism.
    
     * for_each_unique_abbrev: Iterate objects matching a hex prefix for
       disambiguation. The files backend searches loose objects via oidtree,
       multi-pack indices, then non-MIDX packs.
    
     * convert_object_id: Translate between hash algorithms using the loose
       object map. Used during SHA-1 to SHA-256 migration.
    
    Caller routing:
    
     * object-name.c: The abbreviation and disambiguation paths
       (find_short_object_filename, find_abbrev_len_packed, and
       find_short_packed_object) directly access files-backend internals
       (loose cache, pack store, MIDX). These are converted to dispatch
       through the for_each_unique_abbrev vtable method, so that non-files
       backends participate through proper abstraction rather than being
       skipped.
    
     * fast-import.c: end_packfile() replaced direct pack indexing,
       registration, and odb_source_files_downcast() with a call to
       odb_write_packfile(). gfi_unpack_entry() falls back to
       odb_read_object() when the pack slot is NULL (non-files backends
       ingest packs without registering them on disk).
    
    This addresses Patrick's feedback on the previous submission [1]: the
    correct fix for downcast sites is proper vtable abstraction, not
    skipping non-files backends.
    
    Additional:
    
     * ODB_SOURCE_HELPER added to the source type enum
     * odb/source-type.h extracted to avoid circular includes with
       repository.h
     * OBJECT_INFO_KEPT_ONLY flag for backends that track kept status
     * self_contained_out output field on odb_write_packfile_options
    
    Motivation: These methods are needed by the local helper backend series
    (Series 2) [2], which delegates object and reference storage to external
    git-local- helper processes. sqlite-git [3] is a working proof of
    concept that stores objects, refs, and reflogs in a single SQLite
    database with full worktree support.
    
    CC: Junio C Hamano gitster@pobox.com, Patrick Steinhardt ps@pks.im
    
    [1] https://github.com/gitgitgadget/git/pull/2068.patch [2]
    https://github.com/gitgitgadget/git/compare/master...MayCXC:git:ps/series-2-helpers-v3.patch
    [3] https://github.com/MayCXC/sqlite-git

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2074%2FMayCXC%2Fps%2Fseries-1-vtable-v3-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2074/MayCXC/ps/series-1-vtable-v3-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2074

Range-diff vs v1:

 1:  146c7ed0b2 ! 1:  5b3e9a8298 odb: add write_packfile, for_each_unique_abbrev, convert_object_id
     @@ Commit message
      
          Signed-off-by: Aaron Paterson <apaterson@pm.me>
      
     + ## builtin/fast-import.c ##
     +@@ builtin/fast-import.c: static void end_packfile(void)
     + 	running = 1;
     + 	clear_delta_base_cache();
     + 	if (object_count) {
     +-		struct odb_source_files *files = odb_source_files_downcast(pack_data->repo->objects->sources);
     +-		struct packed_git *new_p;
     + 		struct object_id cur_pack_oid;
     +-		char *idx_name;
     + 		int i;
     + 		struct branch *b;
     + 		struct tag *t;
     +@@ builtin/fast-import.c: static void end_packfile(void)
     + 					 object_count, cur_pack_oid.hash,
     + 					 pack_size);
     + 
     +-		if (object_count <= unpack_limit) {
     +-			if (!loosen_small_pack(pack_data)) {
     +-				invalidate_pack_id(pack_id);
     +-				goto discard_pack;
     +-			}
     +-		}
     ++		if (lseek(pack_data->pack_fd, 0, SEEK_SET) < 0)
     ++			die_errno(_("failed seeking to start of '%s'"),
     ++				  pack_data->pack_name);
     + 
     +-		close(pack_data->pack_fd);
     +-		idx_name = keep_pack(create_index());
     ++		if (odb_write_packfile(the_repository->objects,
     ++				       pack_data->pack_fd, NULL))
     ++			die(_("failed to ingest pack"));
     + 
     +-		/* Register the packfile with core git's machinery. */
     +-		new_p = packfile_store_load_pack(files->packed, idx_name, 1);
     +-		if (!new_p)
     +-			die(_("core Git rejected index %s"), idx_name);
     +-		all_packs[pack_id] = new_p;
     +-		free(idx_name);
     ++		/*
     ++		 * Non-files backends do not register a pack on disk,
     ++		 * so NULL out the slot to prevent use-after-free in
     ++		 * gfi_unpack_entry.
     ++		 */
     ++		all_packs[pack_id] = NULL;
     + 
     + 		/* Print the boundary */
     + 		if (pack_edges) {
     +-			fprintf(pack_edges, "%s:", new_p->pack_name);
     ++			fprintf(pack_edges, "pack-%s:",
     ++				hash_to_hex(pack_data->hash));
     + 			for (i = 0; i < branch_table_sz; i++) {
     + 				for (b = branch_table[i]; b; b = b->table_next_branch) {
     + 					if (b->pack_id == pack_id)
     +@@ builtin/fast-import.c: static void *gfi_unpack_entry(
     + {
     + 	enum object_type type;
     + 	struct packed_git *p = all_packs[oe->pack_id];
     ++	if (!p) {
     ++		/*
     ++		 * Pack was ingested by a non-files backend via
     ++		 * odb_write_packfile() and is no longer on disk.
     ++		 * Read the object back through the ODB instead.
     ++		 */
     ++		enum object_type type;
     ++		enum object_type odb_type;
     ++		return odb_read_object(the_repository->objects,
     ++				       &oe->idx.oid, &odb_type, sizep);
     ++	}
     + 	if (p == pack_data && p->pack_size < (pack_size + the_hash_algo->rawsz)) {
     + 		/* The object is stored in the packfile we are writing to
     + 		 * and we have modified it since the last time we scanned
     +
       ## object-name.c ##
      @@
       #include "packfile.h"
     @@ odb.c: int odb_write_object_stream(struct object_database *odb,
       				const char *secondary_sources)
      
       ## odb.h ##
     -@@ odb.h: enum object_info_flags {
     - 	 * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK.
     - 	 */
     - 	OBJECT_INFO_FOR_PREFETCH = (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK),
     -+
     -+	/*
     -+	 * Only consider objects marked as "kept" (surviving GC). Used by
     -+	 * helper backends that track kept status per object. Backends that
     -+	 * do not support kept tracking should return -1 (not found).
     -+	 */
     -+	OBJECT_INFO_KEPT_ONLY = (1 << 5),
     - };
     - 
     - /*
      @@ odb.h: int odb_write_object_stream(struct object_database *odb,
       			    struct odb_write_stream *stream, size_t len,
       			    struct object_id *oid);


 builtin/fast-import.c |  43 ++++---
 object-name.c         |  79 ++++++++++---
 odb.c                 |  26 +++++
 odb.h                 |  19 ++++
 odb/source-files.c    | 259 ++++++++++++++++++++++++++++++++++++++++++
 odb/source.h          | 108 ++++++++++++++++++
 6 files changed, 498 insertions(+), 36 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 9fc6c35b74..160495d9b1 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -876,10 +876,7 @@ static void end_packfile(void)
 	running = 1;
 	clear_delta_base_cache();
 	if (object_count) {
-		struct odb_source_files *files = odb_source_files_downcast(pack_data->repo->objects->sources);
-		struct packed_git *new_p;
 		struct object_id cur_pack_oid;
-		char *idx_name;
 		int i;
 		struct branch *b;
 		struct tag *t;
@@ -891,26 +888,25 @@ static void end_packfile(void)
 					 object_count, cur_pack_oid.hash,
 					 pack_size);
 
-		if (object_count <= unpack_limit) {
-			if (!loosen_small_pack(pack_data)) {
-				invalidate_pack_id(pack_id);
-				goto discard_pack;
-			}
-		}
+		if (lseek(pack_data->pack_fd, 0, SEEK_SET) < 0)
+			die_errno(_("failed seeking to start of '%s'"),
+				  pack_data->pack_name);
 
-		close(pack_data->pack_fd);
-		idx_name = keep_pack(create_index());
+		if (odb_write_packfile(the_repository->objects,
+				       pack_data->pack_fd, NULL))
+			die(_("failed to ingest pack"));
 
-		/* Register the packfile with core git's machinery. */
-		new_p = packfile_store_load_pack(files->packed, idx_name, 1);
-		if (!new_p)
-			die(_("core Git rejected index %s"), idx_name);
-		all_packs[pack_id] = new_p;
-		free(idx_name);
+		/*
+		 * Non-files backends do not register a pack on disk,
+		 * so NULL out the slot to prevent use-after-free in
+		 * gfi_unpack_entry.
+		 */
+		all_packs[pack_id] = NULL;
 
 		/* Print the boundary */
 		if (pack_edges) {
-			fprintf(pack_edges, "%s:", new_p->pack_name);
+			fprintf(pack_edges, "pack-%s:",
+				hash_to_hex(pack_data->hash));
 			for (i = 0; i < branch_table_sz; i++) {
 				for (b = branch_table[i]; b; b = b->table_next_branch) {
 					if (b->pack_id == pack_id)
@@ -1239,6 +1235,17 @@ static void *gfi_unpack_entry(
 {
 	enum object_type type;
 	struct packed_git *p = all_packs[oe->pack_id];
+	if (!p) {
+		/*
+		 * Pack was ingested by a non-files backend via
+		 * odb_write_packfile() and is no longer on disk.
+		 * Read the object back through the ODB instead.
+		 */
+		enum object_type type;
+		enum object_type odb_type;
+		return odb_read_object(the_repository->objects,
+				       &oe->idx.oid, &odb_type, sizep);
+	}
 	if (p == pack_data && p->pack_size < (pack_size + the_hash_algo->rawsz)) {
 		/* The object is stored in the packfile we are writing to
 		 * and we have modified it since the last time we scanned
diff --git a/object-name.c b/object-name.c
index e5adec4c9d..8f503b985f 100644
--- a/object-name.c
+++ b/object-name.c
@@ -20,6 +20,7 @@
 #include "packfile.h"
 #include "pretty.h"
 #include "object-file.h"
+#include "odb/source.h"
 #include "read-cache-ll.h"
 #include "repo-settings.h"
 #include "repository.h"
@@ -111,13 +112,28 @@ static enum cb_next match_prefix(const struct object_id *oid, void *arg)
 	return ds->ambiguous ? CB_BREAK : CB_CONTINUE;
 }
 
+static int disambiguate_cb(const struct object_id *oid,
+			   struct object_info *oi UNUSED, void *data)
+{
+	struct disambiguate_state *ds = data;
+	update_candidates(ds, oid);
+	return ds->ambiguous ? 1 : 0;
+}
+
 static void find_short_object_filename(struct disambiguate_state *ds)
 {
 	struct odb_source *source;
 
-	for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next)
-		oidtree_each(odb_source_loose_cache(source, &ds->bin_pfx),
-				&ds->bin_pfx, ds->len, match_prefix, ds);
+	for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
+		if (source->for_each_unique_abbrev) {
+			odb_source_for_each_unique_abbrev(
+				source, &ds->bin_pfx, ds->len,
+				disambiguate_cb, ds);
+		} else {
+			oidtree_each(odb_source_loose_cache(source, &ds->bin_pfx),
+					&ds->bin_pfx, ds->len, match_prefix, ds);
+		}
+	}
 }
 
 static int match_hash(unsigned len, const unsigned char *a, const unsigned char *b)
@@ -208,15 +224,23 @@ static void find_short_packed_object(struct disambiguate_state *ds)
 
 	odb_prepare_alternates(ds->repo->objects);
 	for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
-		struct multi_pack_index *m = get_multi_pack_index(source);
-		if (m)
-			unique_in_midx(m, ds);
+		if (source->for_each_unique_abbrev) {
+			odb_source_for_each_unique_abbrev(
+				source, &ds->bin_pfx, ds->len,
+				disambiguate_cb, ds);
+		} else {
+			struct multi_pack_index *m = get_multi_pack_index(source);
+			if (m)
+				unique_in_midx(m, ds);
+		}
 	}
 
-	repo_for_each_pack(ds->repo, p) {
-		if (ds->ambiguous)
-			break;
-		unique_in_pack(p, ds);
+	if (!ds->repo->objects->sources->for_each_unique_abbrev) {
+		repo_for_each_pack(ds->repo, p) {
+			if (ds->ambiguous)
+				break;
+			unique_in_pack(p, ds);
+		}
 	}
 }
 
@@ -796,19 +820,38 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
 	mad->init_len = mad->cur_len;
 }
 
-static void find_abbrev_len_packed(struct min_abbrev_data *mad)
+static int abbrev_len_cb(const struct object_id *oid,
+			 struct object_info *oi UNUSED, void *data)
 {
-	struct packed_git *p;
+	struct min_abbrev_data *mad = data;
+	extend_abbrev_len(oid, mad);
+	return 0;
+}
 
+static void find_abbrev_len_packed(struct min_abbrev_data *mad)
+{
 	odb_prepare_alternates(mad->repo->objects);
-	for (struct odb_source *source = mad->repo->objects->sources; source; source = source->next) {
-		struct multi_pack_index *m = get_multi_pack_index(source);
-		if (m)
-			find_abbrev_len_for_midx(m, mad);
+
+	for (struct odb_source *source = mad->repo->objects->sources;
+	     source; source = source->next) {
+		if (source->for_each_unique_abbrev) {
+			mad->init_len = 0;
+			odb_source_for_each_unique_abbrev(
+				source, mad->oid, mad->cur_len,
+				abbrev_len_cb, mad);
+			mad->init_len = mad->cur_len;
+		} else {
+			struct multi_pack_index *m = get_multi_pack_index(source);
+			if (m)
+				find_abbrev_len_for_midx(m, mad);
+		}
 	}
 
-	repo_for_each_pack(mad->repo, p)
-		find_abbrev_len_for_pack(p, mad);
+	if (!mad->repo->objects->sources->for_each_unique_abbrev) {
+		struct packed_git *p;
+		repo_for_each_pack(mad->repo, p)
+			find_abbrev_len_for_pack(p, mad);
+	}
 }
 
 void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
diff --git a/odb.c b/odb.c
index 350e23f3c0..3032d5492c 100644
--- a/odb.c
+++ b/odb.c
@@ -981,6 +981,32 @@ int odb_write_object_stream(struct object_database *odb,
 	return odb_source_write_object_stream(odb->sources, stream, len, oid);
 }
 
+int odb_write_packfile(struct object_database *odb,
+		       int pack_fd,
+		       struct odb_write_packfile_options *opts)
+{
+	return odb_source_write_packfile(odb->sources, pack_fd, opts);
+}
+
+int odb_for_each_unique_abbrev(struct object_database *odb,
+			       const struct object_id *oid_prefix,
+			       unsigned int prefix_len,
+			       odb_for_each_object_cb cb,
+			       void *cb_data)
+{
+	int ret;
+
+	odb_prepare_alternates(odb);
+	for (struct odb_source *source = odb->sources; source; source = source->next) {
+		ret = odb_source_for_each_unique_abbrev(source, oid_prefix,
+							prefix_len, cb, cb_data);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 struct object_database *odb_new(struct repository *repo,
 				const char *primary_source,
 				const char *secondary_sources)
diff --git a/odb.h b/odb.h
index 9aee260105..b7f1a24006 100644
--- a/odb.h
+++ b/odb.h
@@ -570,6 +570,25 @@ int odb_write_object_stream(struct object_database *odb,
 			    struct odb_write_stream *stream, size_t len,
 			    struct object_id *oid);
 
+/*
+ * Ingest a pack from a file descriptor into the primary source.
+ * Returns 0 on success, a negative error code otherwise.
+ */
+struct odb_write_packfile_options;
+int odb_write_packfile(struct object_database *odb,
+		       int pack_fd,
+		       struct odb_write_packfile_options *opts);
+
+/*
+ * Iterate over all objects across all sources whose ID starts with
+ * the given prefix. Used for object name disambiguation.
+ */
+int odb_for_each_unique_abbrev(struct object_database *odb,
+			       const struct object_id *oid_prefix,
+			       unsigned int prefix_len,
+			       odb_for_each_object_cb cb,
+			       void *cb_data);
+
 void parse_alternates(const char *string,
 		      int sep,
 		      const char *relative_base,
diff --git a/odb/source-files.c b/odb/source-files.c
index c08d8993e3..e450c87f91 100644
--- a/odb/source-files.c
+++ b/odb/source-files.c
@@ -1,14 +1,21 @@
 #include "git-compat-util.h"
 #include "abspath.h"
 #include "chdir-notify.h"
+#include "config.h"
 #include "gettext.h"
 #include "lockfile.h"
+#include "loose.h"
+#include "midx.h"
 #include "object-file.h"
 #include "odb.h"
 #include "odb/source.h"
 #include "odb/source-files.h"
+#include "pack-objects.h"
 #include "packfile.h"
+#include "run-command.h"
 #include "strbuf.h"
+#include "strvec.h"
+#include "oidtree.h"
 #include "write-or-die.h"
 
 static void odb_source_files_reparent(const char *name UNUSED,
@@ -232,6 +239,255 @@ out:
 	return ret;
 }
 
+static int odb_source_files_write_packfile(struct odb_source *source,
+					   int pack_fd,
+					   struct odb_write_packfile_options *opts)
+{
+	struct odb_source_files *files = odb_source_files_downcast(source);
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	int fsck_objects = 0;
+	int use_index_pack = 1;
+	int ret;
+
+	if (opts && opts->nr_objects) {
+		int transfer_unpack_limit = -1;
+		int fetch_unpack_limit = -1;
+		int unpack_limit = 100;
+
+		repo_config_get_int(source->odb->repo, "fetch.unpacklimit",
+				    &fetch_unpack_limit);
+		repo_config_get_int(source->odb->repo, "transfer.unpacklimit",
+				    &transfer_unpack_limit);
+		if (0 <= fetch_unpack_limit)
+			unpack_limit = fetch_unpack_limit;
+		else if (0 <= transfer_unpack_limit)
+			unpack_limit = transfer_unpack_limit;
+
+		if (opts->nr_objects < (unsigned int)unpack_limit &&
+		    !opts->from_promisor && !opts->lockfile_out)
+			use_index_pack = 0;
+	}
+
+	cmd.in = pack_fd;
+	cmd.git_cmd = 1;
+
+	if (!use_index_pack) {
+		strvec_push(&cmd.args, "unpack-objects");
+		if (opts && opts->quiet)
+			strvec_push(&cmd.args, "-q");
+		if (opts && opts->pack_header_version)
+			strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
+				     opts->pack_header_version,
+				     opts->pack_header_entries);
+		repo_config_get_bool(source->odb->repo, "transfer.fsckobjects",
+				     &fsck_objects);
+		repo_config_get_bool(source->odb->repo, "receive.fsckobjects",
+				     &fsck_objects);
+		if (fsck_objects)
+			strvec_push(&cmd.args, "--strict");
+		if (opts && opts->max_input_size)
+			strvec_pushf(&cmd.args, "--max-input-size=%lu",
+				     opts->max_input_size);
+		ret = run_command(&cmd);
+		if (ret)
+			return error(_("unpack-objects failed"));
+		return 0;
+	}
+
+	strvec_push(&cmd.args, "index-pack");
+	strvec_push(&cmd.args, "--stdin");
+	strvec_push(&cmd.args, "--keep=write_packfile");
+
+	if (opts && opts->pack_header_version)
+		strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
+			     opts->pack_header_version,
+			     opts->pack_header_entries);
+
+	if (opts) {
+		if (opts->use_thin_pack)
+			strvec_push(&cmd.args, "--fix-thin");
+		if (opts->from_promisor)
+			strvec_push(&cmd.args, "--promisor");
+		if (opts->check_self_contained)
+			strvec_push(&cmd.args, "--check-self-contained-and-connected");
+		if (opts->max_input_size)
+			strvec_pushf(&cmd.args, "--max-input-size=%lu",
+				     opts->max_input_size);
+		if (opts->shallow_file)
+			strvec_pushf(&cmd.env, "GIT_SHALLOW_FILE=%s",
+				     opts->shallow_file);
+		if (opts->report_end_of_input)
+			strvec_push(&cmd.args, "--report-end-of-input");
+		if (opts->fsck_objects)
+			fsck_objects = 1;
+	}
+
+	if (!fsck_objects) {
+		repo_config_get_bool(source->odb->repo, "transfer.fsckobjects",
+				     &fsck_objects);
+		repo_config_get_bool(source->odb->repo, "fetch.fsckobjects",
+				     &fsck_objects);
+	}
+	if (fsck_objects)
+		strvec_push(&cmd.args, "--strict");
+
+	if (opts && opts->lockfile_out) {
+		cmd.out = -1;
+		ret = start_command(&cmd);
+		if (ret)
+			return error(_("index-pack failed to start"));
+		*opts->lockfile_out = index_pack_lockfile(source->odb->repo,
+							  cmd.out, NULL);
+		close(cmd.out);
+		ret = finish_command(&cmd);
+	} else {
+		ret = run_command(&cmd);
+	}
+
+	if (ret)
+		return error(_("index-pack failed"));
+
+	if (opts && opts->check_self_contained)
+		opts->self_contained_out = 1;
+
+	packfile_store_reprepare(files->packed);
+	return 0;
+}
+
+static int match_hash_prefix(unsigned len, const unsigned char *a,
+			     const unsigned char *b)
+{
+	while (len > 1) {
+		if (*a != *b)
+			return 0;
+		a++; b++; len -= 2;
+	}
+	if (len)
+		if ((*a ^ *b) & 0xf0)
+			return 0;
+	return 1;
+}
+
+struct abbrev_cb_data {
+	odb_for_each_object_cb cb;
+	void *cb_data;
+	int ret;
+};
+
+static enum cb_next abbrev_loose_cb(const struct object_id *oid, void *data)
+{
+	struct abbrev_cb_data *d = data;
+	d->ret = d->cb(oid, NULL, d->cb_data);
+	return d->ret ? CB_BREAK : CB_CONTINUE;
+}
+
+static int odb_source_files_for_each_unique_abbrev(struct odb_source *source,
+						   const struct object_id *oid_prefix,
+						   unsigned int prefix_len,
+						   odb_for_each_object_cb cb,
+						   void *cb_data)
+{
+	struct odb_source_files *files = odb_source_files_downcast(source);
+	struct multi_pack_index *m;
+	struct packfile_list_entry *entry;
+	unsigned int hexsz = source->odb->repo->hash_algo->hexsz;
+	unsigned int len = prefix_len > hexsz ? hexsz : prefix_len;
+
+	/* Search loose objects */
+	{
+		struct oidtree *tree = odb_source_loose_cache(source, oid_prefix);
+		if (tree) {
+			struct abbrev_cb_data d = { cb, cb_data, 0 };
+			oidtree_each(tree, oid_prefix, prefix_len, abbrev_loose_cb, &d);
+			if (d.ret)
+				return d.ret;
+		}
+	}
+
+	/* Search multi-pack indices */
+	m = get_multi_pack_index(source);
+	for (; m; m = m->base_midx) {
+		uint32_t num, i, first = 0;
+
+		if (!m->num_objects)
+			continue;
+
+		num = m->num_objects + m->num_objects_in_base;
+		bsearch_one_midx(oid_prefix, m, &first);
+
+		for (i = first; i < num; i++) {
+			struct object_id oid;
+			const struct object_id *current;
+			int ret;
+
+			current = nth_midxed_object_oid(&oid, m, i);
+			if (!match_hash_prefix(len, oid_prefix->hash, current->hash))
+				break;
+			ret = cb(current, NULL, cb_data);
+			if (ret)
+				return ret;
+		}
+	}
+
+	/* Search packs not covered by MIDX */
+	for (entry = packfile_store_get_packs(files->packed); entry; entry = entry->next) {
+		struct packed_git *p = entry->pack;
+		uint32_t num, i, first = 0;
+
+		if (p->multi_pack_index)
+			continue;
+		if (open_pack_index(p) || !p->num_objects)
+			continue;
+
+		num = p->num_objects;
+		bsearch_pack(oid_prefix, p, &first);
+
+		for (i = first; i < num; i++) {
+			struct object_id oid;
+			int ret;
+
+			nth_packed_object_id(&oid, p, i);
+			if (!match_hash_prefix(len, oid_prefix->hash, oid.hash))
+				break;
+			ret = cb(&oid, NULL, cb_data);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int odb_source_files_convert_object_id(struct odb_source *source,
+					      const struct object_id *src,
+					      const struct git_hash_algo *to,
+					      struct object_id *dest)
+{
+	struct odb_source_files *files = odb_source_files_downcast(source);
+	struct loose_object_map *map;
+	kh_oid_map_t *hash_map;
+	khiter_t pos;
+
+	if (!files->loose || !files->loose->map)
+		return -1;
+
+	map = files->loose->map;
+
+	if (to == source->odb->repo->compat_hash_algo)
+		hash_map = map->to_compat;
+	else if (to == source->odb->repo->hash_algo)
+		hash_map = map->to_storage;
+	else
+		return -1;
+
+	pos = kh_get_oid_map(hash_map, *src);
+	if (pos == kh_end(hash_map))
+		return -1;
+
+	oidcpy(dest, kh_value(hash_map, pos));
+	return 0;
+}
+
 struct odb_source_files *odb_source_files_new(struct object_database *odb,
 					      const char *path,
 					      bool local)
@@ -256,6 +512,9 @@ struct odb_source_files *odb_source_files_new(struct object_database *odb,
 	files->base.begin_transaction = odb_source_files_begin_transaction;
 	files->base.read_alternates = odb_source_files_read_alternates;
 	files->base.write_alternate = odb_source_files_write_alternate;
+	files->base.write_packfile = odb_source_files_write_packfile;
+	files->base.for_each_unique_abbrev = odb_source_files_for_each_unique_abbrev;
+	files->base.convert_object_id = odb_source_files_convert_object_id;
 
 	/*
 	 * Ideally, we would only ever store absolute paths in the source. This
diff --git a/odb/source.h b/odb/source.h
index 96c906e7a1..8b898f80ed 100644
--- a/odb/source.h
+++ b/odb/source.h
@@ -13,12 +13,42 @@ enum odb_source_type {
 
 	/* The "files" backend that uses loose objects and packfiles. */
 	ODB_SOURCE_FILES,
+
+	/* An external helper process (git-local-<name>). */
+	ODB_SOURCE_HELPER,
 };
 
 struct object_id;
 struct odb_read_stream;
 struct strvec;
 
+/*
+ * Options for write_packfile. When NULL is passed, the backend
+ * uses sensible defaults.
+ */
+struct odb_write_packfile_options {
+	unsigned int nr_objects;
+	uint32_t pack_header_version;
+	uint32_t pack_header_entries;
+	int use_thin_pack;
+	int from_promisor;
+	int fsck_objects;
+	int check_self_contained;
+	unsigned long max_input_size;
+	int quiet;
+	int show_progress;
+	int report_end_of_input;
+	const char *shallow_file;
+	char **lockfile_out;
+
+	/*
+	 * Output: set to 1 by the backend if the ingested pack was
+	 * verified as self-contained (all referenced objects present).
+	 * Used by the transport layer to skip connectivity checks.
+	 */
+	int self_contained_out;
+};
+
 /*
  * The source is the part of the object database that stores the actual
  * objects. It thus encapsulates the logic to read and write the specific
@@ -237,6 +267,45 @@ struct odb_source {
 	 */
 	int (*write_alternate)(struct odb_source *source,
 			       const char *alternate);
+
+	/*
+	 * Ingest a pack from a file descriptor. Each backend chooses
+	 * its own ingestion strategy:
+	 *
+	 *   - The files backend spawns index-pack (large packs) or
+	 *     unpack-objects (small packs), then registers the result.
+	 *
+	 *   - Non-files backends may parse the pack and write each
+	 *     object individually through write_object.
+	 *
+	 * Returns 0 on success, a negative error code otherwise.
+	 */
+	int (*write_packfile)(struct odb_source *source,
+			      int pack_fd,
+			      struct odb_write_packfile_options *opts);
+
+	/*
+	 * Iterate over all objects whose object ID starts with the
+	 * given prefix. Used for object name disambiguation.
+	 *
+	 * Returns 0 on success, a negative error code in case
+	 * iteration has failed, or a non-zero value from the callback.
+	 */
+	int (*for_each_unique_abbrev)(struct odb_source *source,
+				      const struct object_id *oid_prefix,
+				      unsigned int prefix_len,
+				      odb_for_each_object_cb cb,
+				      void *cb_data);
+
+	/*
+	 * Translate an object ID from one hash algorithm to another
+	 * using the source's internal mapping (for SHA-1/SHA-256
+	 * migration). Returns 0 on success, -1 if no mapping exists.
+	 */
+	int (*convert_object_id)(struct odb_source *source,
+				 const struct object_id *src,
+				 const struct git_hash_algo *to,
+				 struct object_id *dest);
 };
 
 /*
@@ -442,4 +511,43 @@ static inline int odb_source_begin_transaction(struct odb_source *source,
 	return source->begin_transaction(source, out);
 }
 
+/*
+ * Ingest a pack from a file descriptor into the given source. Returns 0 on
+ * success, a negative error code otherwise.
+ */
+static inline int odb_source_write_packfile(struct odb_source *source,
+					    int pack_fd,
+					    struct odb_write_packfile_options *opts)
+{
+	return source->write_packfile(source, pack_fd, opts);
+}
+
+/*
+ * Iterate over all objects in the source whose ID starts with the given
+ * prefix. Used for object name disambiguation.
+ */
+static inline int odb_source_for_each_unique_abbrev(struct odb_source *source,
+						    const struct object_id *oid_prefix,
+						    unsigned int prefix_len,
+						    odb_for_each_object_cb cb,
+						    void *cb_data)
+{
+	return source->for_each_unique_abbrev(source, oid_prefix, prefix_len,
+					      cb, cb_data);
+}
+
+/*
+ * Translate an object ID between hash algorithms using the source's mapping.
+ * Returns 0 on success, -1 if no mapping exists.
+ */
+static inline int odb_source_convert_object_id(struct odb_source *source,
+					       const struct object_id *src,
+					       const struct git_hash_algo *to,
+					       struct object_id *dest)
+{
+	if (!source->convert_object_id)
+		return -1;
+	return source->convert_object_id(source, src, to, dest);
+}
+
 #endif

base-commit: 41688c1a2312f62f44435e1a6d03b4b904b5b0ec
-- 
gitgitgadget

  reply	other threads:[~2026-03-26 13:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 13:07 [PATCH] odb: add write_packfile, for_each_unique_abbrev, convert_object_id Aaron Paterson via GitGitGadget
2026-03-26 13:39 ` Aaron Paterson via GitGitGadget [this message]
2026-03-26 13:58   ` [PATCH v2] " Patrick Steinhardt
2026-03-26 14:21     ` apaterson
2026-03-27  7:04       ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2074.v2.git.1774532383055.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=apaterson@pm.me \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.