* [PATCH 00/13] Carve out loose object source
@ 2025-10-24 9:55 Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
` (15 more replies)
0 siblings, 16 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:55 UTC (permalink / raw)
To: git
Hi,
this patch series carves out loose object sources. The idea is to store
all data that relates to loose objects in a single structure, similar to
our `struct packfile_store`.
The series is structured as follows:
- Patches 1 to 4 perform some cleanups in the vicinity of object
databases.
- Patches 5 to 8 create a new `struct odb_loose_source` and move all
state that is specific to loose objects into it.
- Patches 9 to 13 then adjust functions to work on top of that new
structure.
The motivation here is to make handling of loose objects completely
self-contained as a step towards pluggable object databases.
Thanks!
Patrick
---
Patrick Steinhardt (13):
odb: fix subtle logic to check whether an alternate is usable
odb: introduce `odb_source_new()`
odb: adjust naming to free object sources
object-file: move `fetch_if_missing`
object-file: introduce `struct odb_loose_source`
object-file: move loose object cache into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object map into loose source
object-file: read objects via the loose object source
object-file: rename `has_loose_object()`
object-file: refactor freshening of objects
object-file: rename `write_object_file()`
object-file: refactor writing objects via a stream
builtin/pack-objects.c | 4 +-
builtin/unpack-objects.c | 7 +-
loose.c | 19 ++---
object-file.c | 175 +++++++++++++++++++++--------------------------
object-file.h | 98 ++++++++++++++------------
object-name.c | 2 +-
odb.c | 104 +++++++++++++++++++---------
odb.h | 41 +++++++----
packfile.c | 16 +++++
packfile.h | 3 +
repository.c | 14 ++--
streaming.c | 11 ++-
12 files changed, 287 insertions(+), 207 deletions(-)
---
base-commit: c54a18ef67e59cdbcd77d6294916d42c98c62d1d
change-id: 20251017-b4-pks-odb-loose-backend-b1e003c41107
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 16:21 ` Junio C Hamano
2025-10-30 10:34 ` Karthik Nayak
2025-10-24 9:56 ` [PATCH 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
` (14 subsequent siblings)
15 siblings, 2 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
When adding an alternate to the object database we first check whether
or not the path is usable. A path is usable if:
- It actually exists.
- We don't have it in our object sources yet.
While the former check is trivial enough, the latter part is somewhat
subtle and prone for bugs. This is because the function doesn't only
check whether or not the given path is usable. But if it _is_ usable, we
also store that path in the map of object sources immediately.
The tricky part here is that the path that gets stored in the map is
_not_ copied. Instead, we rely on the fact that subsequent code uses
`strbuf_detach()` to store the exact same allocated memory in the
created object source. Consequently, the memory is owned by the source
but _also_ stored in the map. This subtlety is easy to miss, so if one
decides to refactor this code one can easily end up breaking this
mechanism.
Make the relationship more explicit by not storing the path as part of
`alt_odb_usable()`. Instead, we store the path after we have created the
source now so that we can use the source's path pointer directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/odb.c b/odb.c
index 00a6e71568b..57d85ed9505 100644
--- a/odb.c
+++ b/odb.c
@@ -86,17 +86,16 @@ int odb_mkstemp(struct object_database *odb,
/*
* Return non-zero iff the path is usable as an alternate object database.
*/
-static int alt_odb_usable(struct object_database *o,
- struct strbuf *path,
- const char *normalized_objdir, khiter_t *pos)
+static int alt_odb_usable(struct object_database *o, const char *path,
+ const char *normalized_objdir)
{
int r;
/* Detect cases where alternate disappeared */
- if (!is_directory(path->buf)) {
+ if (!is_directory(path)) {
error(_("object directory %s does not exist; "
"check .git/objects/info/alternates"),
- path->buf);
+ path);
return 0;
}
@@ -113,11 +112,14 @@ static int alt_odb_usable(struct object_database *o,
assert(r == 1); /* never used */
kh_value(o->source_by_path, p) = o->sources;
}
- if (fspatheq(path->buf, normalized_objdir))
+
+ if (fspatheq(path, normalized_objdir))
+ return 0;
+
+ if (kh_get_odb_path_map(o->source_by_path, path) < kh_end(o->source_by_path))
return 0;
- *pos = kh_put_odb_path_map(o->source_by_path, path->buf, &r);
- /* r: 0 = exists, 1 = never used, 2 = deleted */
- return r == 0 ? 0 : 1;
+
+ return 1;
}
/*
@@ -148,6 +150,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
struct strbuf pathbuf = STRBUF_INIT;
struct strbuf tmp = STRBUF_INIT;
khiter_t pos;
+ int ret;
if (!is_absolute_path(dir) && relative_base) {
strbuf_realpath(&pathbuf, relative_base, 1);
@@ -172,20 +175,21 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
strbuf_reset(&tmp);
strbuf_realpath(&tmp, odb->sources->path, 1);
- if (!alt_odb_usable(odb, &pathbuf, tmp.buf, &pos))
+ if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
goto error;
CALLOC_ARRAY(alternate, 1);
alternate->odb = odb;
alternate->local = false;
- /* pathbuf.buf is already in r->objects->source_by_path */
alternate->path = strbuf_detach(&pathbuf, NULL);
/* add the alternate entry */
*odb->sources_tail = alternate;
odb->sources_tail = &(alternate->next);
- alternate->next = NULL;
- assert(odb->source_by_path);
+
+ pos = kh_put_odb_path_map(odb->source_by_path, alternate->path, &ret);
+ if (!ret)
+ BUG("source must not yet exist");
kh_value(odb->source_by_path, pos) = alternate;
/* recursively add alternates */
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 02/13] odb: introduce `odb_source_new()`
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 16:37 ` Junio C Hamano
2025-10-24 9:56 ` [PATCH 03/13] odb: adjust naming to free object sources Patrick Steinhardt
` (13 subsequent siblings)
15 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
We have three different locations where we create a new ODB source.
Deduplicate the logic via a new `odb_source_new()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 23 ++++++++++++++++-------
odb.h | 4 ++++
repository.c | 14 +++++++++-----
3 files changed, 29 insertions(+), 12 deletions(-)
diff --git a/odb.c b/odb.c
index 57d85ed9505..d2d4c514ae5 100644
--- a/odb.c
+++ b/odb.c
@@ -141,6 +141,20 @@ static void read_info_alternates(struct object_database *odb,
const char *relative_base,
int depth);
+struct odb_source *odb_source_new(struct object_database *odb,
+ const char *path,
+ bool local)
+{
+ struct odb_source *source;
+
+ CALLOC_ARRAY(source, 1);
+ source->odb = odb;
+ source->local = local;
+ source->path = xstrdup(path);
+
+ return source;
+}
+
static struct odb_source *link_alt_odb_entry(struct object_database *odb,
const char *dir,
const char *relative_base,
@@ -178,10 +192,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
goto error;
- CALLOC_ARRAY(alternate, 1);
- alternate->odb = odb;
- alternate->local = false;
- alternate->path = strbuf_detach(&pathbuf, NULL);
+ alternate = odb_source_new(odb, pathbuf.buf, false);
/* add the alternate entry */
*odb->sources_tail = alternate;
@@ -341,9 +352,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
* Make a new primary odb and link the old primary ODB in as an
* alternate
*/
- source = xcalloc(1, sizeof(*source));
- source->odb = odb;
- source->path = xstrdup(dir);
+ source = odb_source_new(odb, dir, false);
/*
* Disable ref updates while a temporary odb is active, since
diff --git a/odb.h b/odb.h
index e6602dd90c8..2bec895d135 100644
--- a/odb.h
+++ b/odb.h
@@ -89,6 +89,10 @@ struct odb_source {
char *path;
};
+struct odb_source *odb_source_new(struct object_database *odb,
+ const char *path,
+ bool local);
+
struct packed_git;
struct packfile_store;
struct cached_object_entry;
diff --git a/repository.c b/repository.c
index 6faf5c73981..6aaa7ba0086 100644
--- a/repository.c
+++ b/repository.c
@@ -160,20 +160,24 @@ void repo_set_gitdir(struct repository *repo,
* until after xstrdup(root). Then we can free it.
*/
char *old_gitdir = repo->gitdir;
+ char *objects_path = NULL;
repo->gitdir = xstrdup(gitfile ? gitfile : root);
free(old_gitdir);
repo_set_commondir(repo, o->commondir);
+ expand_base_dir(&objects_path, o->object_dir,
+ repo->commondir, "objects");
if (!repo->objects->sources) {
- CALLOC_ARRAY(repo->objects->sources, 1);
- repo->objects->sources->odb = repo->objects;
- repo->objects->sources->local = true;
+ repo->objects->sources = odb_source_new(repo->objects,
+ objects_path, true);
repo->objects->sources_tail = &repo->objects->sources->next;
+ free(objects_path);
+ } else {
+ free(repo->objects->sources->path);
+ repo->objects->sources->path = objects_path;
}
- expand_base_dir(&repo->objects->sources->path, o->object_dir,
- repo->commondir, "objects");
repo->objects->sources->disable_ref_updates = o->disable_ref_updates;
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 03/13] odb: adjust naming to free object sources
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-30 10:41 ` Karthik Nayak
2025-10-24 9:56 ` [PATCH 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
` (12 subsequent siblings)
15 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
The functions `free_object_directory()` and `free_object_directories()`
are responsible for freeing a single object source or all object sources
connected to an object database, respectively. The associated structure
has been renamed from `struct object_directory` to `struct odb_source`
recently though, so the names are somewhat stale nowadays.
Rename them to mention the new struct name instead. Furthermore, while
at it, adapt them to our modern naming schema where we first have the
subject followed by a verb.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/odb.c b/odb.c
index d2d4c514ae5..77490d7fdbe 100644
--- a/odb.c
+++ b/odb.c
@@ -365,7 +365,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
return source->next;
}
-static void free_object_directory(struct odb_source *source)
+static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_clear_loose_cache(source);
@@ -387,7 +387,7 @@ void odb_restore_primary_source(struct object_database *odb,
BUG("we expect the old primary object store to be the first alternate");
odb->sources = restore_source;
- free_object_directory(cur_source);
+ odb_source_free(cur_source);
}
char *compute_alternate_path(const char *path, struct strbuf *err)
@@ -1015,13 +1015,13 @@ struct object_database *odb_new(struct repository *repo)
return o;
}
-static void free_object_directories(struct object_database *o)
+static void odb_free_sources(struct object_database *o)
{
while (o->sources) {
struct odb_source *next;
next = o->sources->next;
- free_object_directory(o->sources);
+ odb_source_free(o->sources);
o->sources = next;
}
kh_destroy_odb_path_map(o->source_by_path);
@@ -1039,7 +1039,7 @@ void odb_clear(struct object_database *o)
o->commit_graph = NULL;
o->commit_graph_attempted = 0;
- free_object_directories(o);
+ odb_free_sources(o);
o->sources_tail = NULL;
o->loaded_alternates = 0;
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 04/13] object-file: move `fetch_if_missing`
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (2 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 03/13] odb: adjust naming to free object sources Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 05/13] object-file: introduce `struct odb_loose_source` Patrick Steinhardt
` (11 subsequent siblings)
15 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
The `fetch_if_missing` global variable is declared in "object-file.h"
but defined in "odb.c". The variable relates to the whole object
database instead of only loose objects, so move the declaration into
"odb.h" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.h | 8 --------
odb.h | 8 ++++++++
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/object-file.h b/object-file.h
index 3fd48dcafbf..097e9764be1 100644
--- a/object-file.h
+++ b/object-file.h
@@ -7,14 +7,6 @@
struct index_state;
-/*
- * Set this to 0 to prevent odb_read_object_info_extended() from fetching missing
- * blobs. This has a difference only if extensions.partialClone is set.
- *
- * Its default value is 1.
- */
-extern int fetch_if_missing;
-
enum {
INDEX_WRITE_OBJECT = (1 << 0),
INDEX_FORMAT_CHECK = (1 << 1),
diff --git a/odb.h b/odb.h
index 2bec895d135..2346ffeca85 100644
--- a/odb.h
+++ b/odb.h
@@ -14,6 +14,14 @@ struct strbuf;
struct repository;
struct multi_pack_index;
+/*
+ * Set this to 0 to prevent odb_read_object_info_extended() from fetching missing
+ * blobs. This has a difference only if extensions.partialClone is set.
+ *
+ * Its default value is 1.
+ */
+extern int fetch_if_missing;
+
/*
* Compute the exact path an alternate is at and returns it. In case of
* error NULL is returned and the human readable error is added to `err`
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 05/13] object-file: introduce `struct odb_loose_source`
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (3 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-30 10:47 ` Karthik Nayak
2025-10-24 9:56 ` [PATCH 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
` (10 subsequent siblings)
15 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
Currently, all state that relates to loose objects is held directly by
the `struct odb_source`. Introduce a new `struct odb_loose_source` to
hold the state instead so that it is entirely self-contained.
This structure will eventually morph into the backend for accessing
loose objects. As such, this is part of the refactorings to introduce
pluggable object databases.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 13 +++++++++++++
object-file.h | 7 +++++++
odb.c | 2 ++
odb.h | 3 +++
4 files changed, 25 insertions(+)
diff --git a/object-file.c b/object-file.c
index 4675c8ed6b6..38e09262e42 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1995,3 +1995,16 @@ void object_file_transaction_commit(struct odb_transaction *transaction)
transaction->odb->transaction = NULL;
free(transaction);
}
+
+struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
+{
+ struct odb_loose_source *loose;
+ CALLOC_ARRAY(loose, 1);
+ loose->source = source;
+ return loose;
+}
+
+void odb_loose_source_free(struct odb_loose_source *source)
+{
+ free(source);
+}
diff --git a/object-file.h b/object-file.h
index 097e9764be1..aa84fc2a752 100644
--- a/object-file.h
+++ b/object-file.h
@@ -18,6 +18,13 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa
struct odb_source;
+struct odb_loose_source {
+ struct odb_source *source;
+};
+
+struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
+void odb_loose_source_free(struct odb_loose_source *source);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
diff --git a/odb.c b/odb.c
index 77490d7fdbe..f1b250ceffe 100644
--- a/odb.c
+++ b/odb.c
@@ -151,6 +151,7 @@ struct odb_source *odb_source_new(struct object_database *odb,
source->odb = odb;
source->local = local;
source->path = xstrdup(path);
+ source->loose = odb_loose_source_new(source);
return source;
}
@@ -368,6 +369,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
static void odb_source_free(struct odb_source *source)
{
free(source->path);
+ odb_loose_source_free(source->loose);
odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
diff --git a/odb.h b/odb.h
index 2346ffeca85..f7e1bf87803 100644
--- a/odb.h
+++ b/odb.h
@@ -48,6 +48,9 @@ struct odb_source {
/* Object database that owns this object source. */
struct object_database *odb;
+ /* Private state for loose objects. */
+ struct odb_loose_source *loose;
+
/*
* Used to store the results of readdir(3) calls when we are OK
* sacrificing accuracy due to races for speed. That includes
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 06/13] object-file: move loose object cache into loose source
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (4 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 05/13] object-file: introduce `struct odb_loose_source` Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 21:44 ` Junio C Hamano
2025-10-24 9:56 ` [PATCH 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
` (9 subsequent siblings)
15 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
Our loose objects use a cache that (optionally) stores all objects for
each of the opened sharding directories. This cache is located in the
`struct odb_source`, but now that we have `struct odb_loose_source` it
makes sense to move it into the latter structure so that all state that
relates to loose objects is entirely self-contained.
Do so. While at it, rename corresponding functions to have a prefix that
relates to `struct odb_loose_source`.
Note that despite this prefix, the functions still accept a `struct
odb_source` as input. This is done intentionally: once we introduce
pluggable object databases, we will continue to accept this struct but
then do a cast inside these functions to `struct odb_loose_source`. This
design is similar to how we do it for our ref backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
loose.c | 9 +++++----
object-file.c | 35 +++++++++++++++++++----------------
object-file.h | 16 ++++++++++++++--
object-name.c | 2 +-
odb.c | 1 -
odb.h | 12 ------------
6 files changed, 39 insertions(+), 36 deletions(-)
diff --git a/loose.c b/loose.c
index e8ea6e7e24b..8cc7573ff2b 100644
--- a/loose.c
+++ b/loose.c
@@ -1,6 +1,7 @@
#include "git-compat-util.h"
#include "hash.h"
#include "path.h"
+#include "object-file.h"
#include "odb.h"
#include "hex.h"
#include "repository.h"
@@ -54,7 +55,7 @@ static int insert_loose_map(struct odb_source *source,
inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
if (inserted)
- oidtree_insert(source->loose_objects_cache, compat_oid);
+ oidtree_insert(source->loose->cache, compat_oid);
return inserted;
}
@@ -66,9 +67,9 @@ static int load_one_loose_object_map(struct repository *repo, struct odb_source
if (!source->loose_map)
loose_object_map_init(&source->loose_map);
- if (!source->loose_objects_cache) {
- ALLOC_ARRAY(source->loose_objects_cache, 1);
- oidtree_init(source->loose_objects_cache);
+ if (!source->loose->cache) {
+ ALLOC_ARRAY(source->loose->cache, 1);
+ oidtree_init(source->loose->cache);
}
insert_loose_map(source, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
diff --git a/object-file.c b/object-file.c
index 38e09262e42..d7c6b1316cc 100644
--- a/object-file.c
+++ b/object-file.c
@@ -223,7 +223,7 @@ static int quick_has_loose(struct repository *r,
odb_prepare_alternates(r->objects);
for (source = r->objects->sources; source; source = source->next) {
- if (oidtree_contains(odb_loose_cache(source, oid), oid))
+ if (oidtree_contains(odb_loose_source_cache(source, oid), oid))
return 1;
}
return 0;
@@ -1802,44 +1802,44 @@ static int append_loose_object(const struct object_id *oid,
return 0;
}
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid)
+struct oidtree *odb_loose_source_cache(struct odb_source *source,
+ const struct object_id *oid)
{
int subdir_nr = oid->hash[0];
struct strbuf buf = STRBUF_INIT;
- size_t word_bits = bitsizeof(source->loose_objects_subdir_seen[0]);
+ size_t word_bits = bitsizeof(source->loose->subdir_seen[0]);
size_t word_index = subdir_nr / word_bits;
size_t mask = (size_t)1u << (subdir_nr % word_bits);
uint32_t *bitmap;
if (subdir_nr < 0 ||
- (size_t) subdir_nr >= bitsizeof(source->loose_objects_subdir_seen))
+ (size_t) subdir_nr >= bitsizeof(source->loose->subdir_seen))
BUG("subdir_nr out of range");
- bitmap = &source->loose_objects_subdir_seen[word_index];
+ bitmap = &source->loose->subdir_seen[word_index];
if (*bitmap & mask)
- return source->loose_objects_cache;
- if (!source->loose_objects_cache) {
- ALLOC_ARRAY(source->loose_objects_cache, 1);
- oidtree_init(source->loose_objects_cache);
+ return source->loose->cache;
+ if (!source->loose->cache) {
+ ALLOC_ARRAY(source->loose->cache, 1);
+ oidtree_init(source->loose->cache);
}
strbuf_addstr(&buf, source->path);
for_each_file_in_obj_subdir(subdir_nr, &buf,
source->odb->repo->hash_algo,
append_loose_object,
NULL, NULL,
- source->loose_objects_cache);
+ source->loose->cache);
*bitmap |= mask;
strbuf_release(&buf);
- return source->loose_objects_cache;
+ return source->loose->cache;
}
void odb_clear_loose_cache(struct odb_source *source)
{
- oidtree_clear(source->loose_objects_cache);
- FREE_AND_NULL(source->loose_objects_cache);
- memset(&source->loose_objects_subdir_seen, 0,
- sizeof(source->loose_objects_subdir_seen));
+ oidtree_clear(source->loose->cache);
+ FREE_AND_NULL(source->loose->cache);
+ memset(&source->loose->subdir_seen, 0,
+ sizeof(source->loose->subdir_seen));
}
static int check_stream_oid(git_zstream *stream,
@@ -2006,5 +2006,8 @@ struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
void odb_loose_source_free(struct odb_loose_source *source)
{
+ if (!source)
+ return;
+ odb_clear_loose_cache(source->source);
free(source);
}
diff --git a/object-file.h b/object-file.h
index aa84fc2a752..9ecb26a4b35 100644
--- a/object-file.h
+++ b/object-file.h
@@ -20,6 +20,18 @@ struct odb_source;
struct odb_loose_source {
struct odb_source *source;
+
+ /*
+ * Used to store the results of readdir(3) calls when we are OK
+ * sacrificing accuracy due to races for speed. That includes
+ * object existence with OBJECT_INFO_QUICK, as well as
+ * our search for unique abbreviated hashes. Don't use it for tasks
+ * requiring greater accuracy!
+ *
+ * Be sure to call odb_load_loose_cache() before using.
+ */
+ uint32_t subdir_seen[8]; /* 256 bits */
+ struct oidtree *cache;
};
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
@@ -29,8 +41,8 @@ void odb_loose_source_free(struct odb_loose_source *source);
* Populate and return the loose object cache array corresponding to the
* given object ID.
*/
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid);
+struct oidtree *odb_loose_source_cache(struct odb_source *source,
+ const struct object_id *oid);
/* Empty the loose object cache for the specified object directory. */
void odb_clear_loose_cache(struct odb_source *source);
diff --git a/object-name.c b/object-name.c
index f6902e140dd..77e33e693aa 100644
--- a/object-name.c
+++ b/object-name.c
@@ -116,7 +116,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
struct odb_source *source;
for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next)
- oidtree_each(odb_loose_cache(source, &ds->bin_pfx),
+ oidtree_each(odb_loose_source_cache(source, &ds->bin_pfx),
&ds->bin_pfx, ds->len, match_prefix, ds);
}
diff --git a/odb.c b/odb.c
index f1b250ceffe..5a5e770dcd0 100644
--- a/odb.c
+++ b/odb.c
@@ -370,7 +370,6 @@ static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_loose_source_free(source->loose);
- odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
}
diff --git a/odb.h b/odb.h
index f7e1bf87803..ad57193c66a 100644
--- a/odb.h
+++ b/odb.h
@@ -51,18 +51,6 @@ struct odb_source {
/* Private state for loose objects. */
struct odb_loose_source *loose;
- /*
- * Used to store the results of readdir(3) calls when we are OK
- * sacrificing accuracy due to races for speed. That includes
- * object existence with OBJECT_INFO_QUICK, as well as
- * our search for unique abbreviated hashes. Don't use it for tasks
- * requiring greater accuracy!
- *
- * Be sure to call odb_load_loose_cache() before using.
- */
- uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
- struct oidtree *loose_objects_cache;
-
/* Map between object IDs for loose objects. */
struct loose_object_map *loose_map;
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 07/13] object-file: hide internals when we need to reprepare loose sources
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (5 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 08/13] object-file: move loose object map into loose source Patrick Steinhardt
` (8 subsequent siblings)
15 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
There are two different situations where we have to clear the cache of
loose objects:
- When freeing the loose object source itself to avoid memory leaks.
- When repreparing the loose object source so that any potentially-
stale data is getting evicted from the cache.
The former is already handled by `odb_loose_source_free()`. But the
latter case is still done manually by in `odb_reprepare()`, so we are
leaking internals into that code.
Introduce a new `odb_loose_source_reprepare()` function as an equivalent
to `packfile_store_prepare()` to hide these implementation details.
Furthermore, while at it, rename the function `odb_clear_loose_cache()`
to `odb_loose_source_clear()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 17 +++++++++++------
object-file.h | 6 +++---
odb.c | 2 +-
3 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/object-file.c b/object-file.c
index d7c6b1316cc..34c7fb434dc 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1834,12 +1834,17 @@ struct oidtree *odb_loose_source_cache(struct odb_source *source,
return source->loose->cache;
}
-void odb_clear_loose_cache(struct odb_source *source)
+static void odb_loose_source_clear_cache(struct odb_loose_source *source)
{
- oidtree_clear(source->loose->cache);
- FREE_AND_NULL(source->loose->cache);
- memset(&source->loose->subdir_seen, 0,
- sizeof(source->loose->subdir_seen));
+ oidtree_clear(source->cache);
+ FREE_AND_NULL(source->cache);
+ memset(&source->subdir_seen, 0,
+ sizeof(source->subdir_seen));
+}
+
+void odb_loose_source_reprepare(struct odb_source *source)
+{
+ odb_loose_source_clear_cache(source->loose);
}
static int check_stream_oid(git_zstream *stream,
@@ -2008,6 +2013,6 @@ void odb_loose_source_free(struct odb_loose_source *source)
{
if (!source)
return;
- odb_clear_loose_cache(source->source);
+ odb_loose_source_clear_cache(source);
free(source);
}
diff --git a/object-file.h b/object-file.h
index 9ecb26a4b35..a8d6cf513d6 100644
--- a/object-file.h
+++ b/object-file.h
@@ -37,6 +37,9 @@ struct odb_loose_source {
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
void odb_loose_source_free(struct odb_loose_source *source);
+/* Reprepare the loose source by emptying the loose object cache. */
+void odb_loose_source_reprepare(struct odb_source *source);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -44,9 +47,6 @@ void odb_loose_source_free(struct odb_loose_source *source);
struct oidtree *odb_loose_source_cache(struct odb_source *source,
const struct object_id *oid);
-/* Empty the loose object cache for the specified object directory. */
-void odb_clear_loose_cache(struct odb_source *source);
-
/*
* Put in `buf` the name of the file in the local object database that
* would be used to store a loose object with the specified oid.
diff --git a/odb.c b/odb.c
index 5a5e770dcd0..b4196f0b323 100644
--- a/odb.c
+++ b/odb.c
@@ -1071,7 +1071,7 @@ void odb_reprepare(struct object_database *o)
odb_prepare_alternates(o);
for (source = o->sources; source; source = source->next)
- odb_clear_loose_cache(source);
+ odb_loose_source_reprepare(source);
o->approximate_object_count_valid = 0;
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 08/13] object-file: move loose object map into loose source
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (6 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 09/13] object-file: read objects via the loose object source Patrick Steinhardt
` (7 subsequent siblings)
15 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
The loose object map is used to map from the repository's canonical
object hash to the compatibility hash. As the name indicates, this map
is only used for loose objects, and as such it is tied to a specific
loose object source.
Same as with preceding commits, move this map into the loose object
source accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
loose.c | 10 +++++-----
object-file.c | 1 +
object-file.h | 3 +++
odb.c | 1 -
odb.h | 3 ---
5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/loose.c b/loose.c
index 8cc7573ff2b..56cf64b648b 100644
--- a/loose.c
+++ b/loose.c
@@ -49,7 +49,7 @@ static int insert_loose_map(struct odb_source *source,
const struct object_id *oid,
const struct object_id *compat_oid)
{
- struct loose_object_map *map = source->loose_map;
+ struct loose_object_map *map = source->loose->map;
int inserted = 0;
inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
@@ -65,8 +65,8 @@ static int load_one_loose_object_map(struct repository *repo, struct odb_source
struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
FILE *fp;
- if (!source->loose_map)
- loose_object_map_init(&source->loose_map);
+ if (!source->loose->map)
+ loose_object_map_init(&source->loose->map);
if (!source->loose->cache) {
ALLOC_ARRAY(source->loose->cache, 1);
oidtree_init(source->loose->cache);
@@ -125,7 +125,7 @@ int repo_read_loose_object_map(struct repository *repo)
int repo_write_loose_object_map(struct repository *repo)
{
- kh_oid_map_t *map = repo->objects->sources->loose_map->to_compat;
+ kh_oid_map_t *map = repo->objects->sources->loose->map->to_compat;
struct lock_file lock;
int fd;
khiter_t iter;
@@ -231,7 +231,7 @@ int repo_loose_object_map_oid(struct repository *repo,
khiter_t pos;
for (source = repo->objects->sources; source; source = source->next) {
- struct loose_object_map *loose_map = source->loose_map;
+ struct loose_object_map *loose_map = source->loose->map;
if (!loose_map)
continue;
map = (to == repo->compat_hash_algo) ?
diff --git a/object-file.c b/object-file.c
index 34c7fb434dc..14daa2bdd90 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2014,5 +2014,6 @@ void odb_loose_source_free(struct odb_loose_source *source)
if (!source)
return;
odb_loose_source_clear_cache(source);
+ loose_object_map_clear(&source->map);
free(source);
}
diff --git a/object-file.h b/object-file.h
index a8d6cf513d6..706f1e1872d 100644
--- a/object-file.h
+++ b/object-file.h
@@ -32,6 +32,9 @@ struct odb_loose_source {
*/
uint32_t subdir_seen[8]; /* 256 bits */
struct oidtree *cache;
+
+ /* Map between object IDs for loose objects. */
+ struct loose_object_map *map;
};
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
diff --git a/odb.c b/odb.c
index b4196f0b323..96059456f20 100644
--- a/odb.c
+++ b/odb.c
@@ -370,7 +370,6 @@ static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_loose_source_free(source->loose);
- loose_object_map_clear(&source->loose_map);
free(source);
}
diff --git a/odb.h b/odb.h
index ad57193c66a..25fbcd7d951 100644
--- a/odb.h
+++ b/odb.h
@@ -51,9 +51,6 @@ struct odb_source {
/* Private state for loose objects. */
struct odb_loose_source *loose;
- /* Map between object IDs for loose objects. */
- struct loose_object_map *loose_map;
-
/*
* private data
*
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 09/13] object-file: read objects via the loose object source
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (7 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 08/13] object-file: move loose object map into loose source Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-30 12:19 ` Karthik Nayak
2025-10-24 9:56 ` [PATCH 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
` (6 subsequent siblings)
15 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
When reading an object via `loose_object_info()` or `map_loose_object()`
we hand in the whole repository. We then iterate through each of the
object sources to figure out whether that source has the object in
question.
This logic is reversing responsibility though: a specific backend should
only care about one specific source, where the object sources themselves
are then managed by the object database.
Refactor the code accordingly by passing an object source to both of
these functions instead. The different sources are then handled by
either `do_oid_object_info_extended()`, which sits on the object
database level, and by `open_istream_loose()`. The latter function
arguably is still at the wrong level, but this will be cleaned up at a
later point in time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 68 ++++++++++++++++++++++-------------------------------------
object-file.h | 15 +++++++------
odb.c | 9 ++++++--
streaming.c | 11 +++++++++-
4 files changed, 50 insertions(+), 53 deletions(-)
diff --git a/object-file.c b/object-file.c
index 14daa2bdd90..d9724e3105f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -167,25 +167,22 @@ int stream_object_signature(struct repository *r, const struct object_id *oid)
}
/*
- * Find "oid" as a loose object in the local repository or in an alternate.
+ * Find "oid" as a loose object in given source.
* Returns 0 on success, negative on failure.
*
* The "path" out-parameter will give the path of the object we found (if any).
* Note that it may point to static storage and is only valid until another
* call to stat_loose_object().
*/
-static int stat_loose_object(struct repository *r, const struct object_id *oid,
+static int stat_loose_object(struct odb_loose_source *source,
+ const struct object_id *oid,
struct stat *st, const char **path)
{
- struct odb_source *source;
static struct strbuf buf = STRBUF_INIT;
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- *path = odb_loose_path(source, &buf, oid);
- if (!lstat(*path, st))
- return 0;
- }
+ *path = odb_loose_path(source->source, &buf, oid);
+ if (!lstat(*path, st))
+ return 0;
return -1;
}
@@ -194,39 +191,24 @@ static int stat_loose_object(struct repository *r, const struct object_id *oid,
* Like stat_loose_object(), but actually open the object and return the
* descriptor. See the caveats on the "path" parameter above.
*/
-static int open_loose_object(struct repository *r,
+static int open_loose_object(struct odb_loose_source *source,
const struct object_id *oid, const char **path)
{
- int fd;
- struct odb_source *source;
- int most_interesting_errno = ENOENT;
static struct strbuf buf = STRBUF_INIT;
+ int fd;
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- *path = odb_loose_path(source, &buf, oid);
- fd = git_open(*path);
- if (fd >= 0)
- return fd;
+ *path = odb_loose_path(source->source, &buf, oid);
+ fd = git_open(*path);
+ if (fd >= 0)
+ return fd;
- if (most_interesting_errno == ENOENT)
- most_interesting_errno = errno;
- }
- errno = most_interesting_errno;
return -1;
}
-static int quick_has_loose(struct repository *r,
+static int quick_has_loose(struct odb_loose_source *source,
const struct object_id *oid)
{
- struct odb_source *source;
-
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- if (oidtree_contains(odb_loose_source_cache(source, oid), oid))
- return 1;
- }
- return 0;
+ return !!oidtree_contains(odb_loose_source_cache(source->source, oid), oid);
}
/*
@@ -252,12 +234,12 @@ static void *map_fd(int fd, const char *path, unsigned long *size)
return map;
}
-void *map_loose_object(struct repository *r,
- const struct object_id *oid,
- unsigned long *size)
+void *odb_loose_source_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size)
{
const char *p;
- int fd = open_loose_object(r, oid, &p);
+ int fd = open_loose_object(source->loose, oid, &p);
if (fd < 0)
return NULL;
@@ -407,9 +389,9 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
return 0;
}
-int loose_object_info(struct repository *r,
- const struct object_id *oid,
- struct object_info *oi, int flags)
+int odb_loose_source_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags)
{
int status = 0;
int fd;
@@ -422,7 +404,7 @@ int loose_object_info(struct repository *r,
enum object_type type_scratch;
if (oi->delta_base_oid)
- oidclr(oi->delta_base_oid, r->hash_algo);
+ oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
/*
* If we don't care about type or size, then we don't
@@ -435,15 +417,15 @@ int loose_object_info(struct repository *r,
if (!oi->typep && !oi->sizep && !oi->contentp) {
struct stat st;
if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(r, oid) ? 0 : -1;
- if (stat_loose_object(r, oid, &st, &path) < 0)
+ return quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
if (oi->disk_sizep)
*oi->disk_sizep = st.st_size;
return 0;
}
- fd = open_loose_object(r, oid, &path);
+ fd = open_loose_object(source->loose, oid, &path);
if (fd < 0) {
if (errno != ENOENT)
error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
diff --git a/object-file.h b/object-file.h
index 706f1e1872d..cc723c9baec 100644
--- a/object-file.h
+++ b/object-file.h
@@ -43,6 +43,14 @@ void odb_loose_source_free(struct odb_loose_source *source);
/* Reprepare the loose source by emptying the loose object cache. */
void odb_loose_source_reprepare(struct odb_source *source);
+int odb_loose_source_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags);
+
+void *odb_loose_source_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -66,9 +74,6 @@ const char *odb_loose_path(struct odb_source *source,
int has_loose_object(struct odb_source *source,
const struct object_id *oid);
-void *map_loose_object(struct repository *r, const struct object_id *oid,
- unsigned long *size);
-
/*
* Iterate over the files in the loose-object parts of the object
* directory "path", triggering the following callbacks:
@@ -196,10 +201,6 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
*/
int stream_object_signature(struct repository *r, const struct object_id *oid);
-int loose_object_info(struct repository *r,
- const struct object_id *oid,
- struct object_info *oi, int flags);
-
enum finalize_object_file_flags {
FOF_SKIP_COLLISION_CHECK = 1,
};
diff --git a/odb.c b/odb.c
index 96059456f20..5dc1e2c7eb0 100644
--- a/odb.c
+++ b/odb.c
@@ -697,13 +697,18 @@ static int do_oid_object_info_extended(struct object_database *odb,
return 0;
}
+ odb_prepare_alternates(odb);
+
while (1) {
+ struct odb_source *source;
+
if (find_pack_entry(odb->repo, real, &e))
break;
/* Most likely it's a loose object. */
- if (!loose_object_info(odb->repo, real, oi, flags))
- return 0;
+ for (source = odb->sources; source; source = source->next)
+ if (!odb_loose_source_read_object_info(source, real, oi, flags))
+ return 0;
/* Not a loose object; someone else may have just packed it. */
if (!(flags & OBJECT_INFO_QUICK)) {
diff --git a/streaming.c b/streaming.c
index 4b13827668e..8e554abd084 100644
--- a/streaming.c
+++ b/streaming.c
@@ -230,12 +230,21 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
enum object_type *type)
{
struct object_info oi = OBJECT_INFO_INIT;
+ struct odb_source *source;
+
oi.sizep = &st->size;
oi.typep = type;
- st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
+ odb_prepare_alternates(r->objects);
+ for (source = r->objects->sources; source; source = source->next) {
+ st->u.loose.mapped = odb_loose_source_map_object(source, oid,
+ &st->u.loose.mapsize);
+ if (st->u.loose.mapped)
+ break;
+ }
if (!st->u.loose.mapped)
return -1;
+
switch (unpack_loose_header(&st->z, st->u.loose.mapped,
st->u.loose.mapsize, st->u.loose.hdr,
sizeof(st->u.loose.hdr))) {
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 10/13] object-file: rename `has_loose_object()`
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (8 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 09/13] object-file: read objects via the loose object source Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 11/13] object-file: refactor freshening of objects Patrick Steinhardt
` (5 subsequent siblings)
15 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
Rename `has_loose_object()` to `odb_loose_source_has_object()` so that
it becomes clear that this is tied to a specific loose object source.
This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/pack-objects.c | 4 ++--
object-file.c | 6 +++---
object-file.h | 16 ++++++++--------
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5bdc44fb2de..c09cb342ee9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1716,7 +1716,7 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
*/
struct odb_source *source = the_repository->objects->sources->next;
for (; source; source = source->next)
- if (has_loose_object(source, oid))
+ if (odb_loose_source_has_object(source, oid))
return 0;
}
@@ -3980,7 +3980,7 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type
int found = 0;
for (; !found && source; source = source->next)
- if (has_loose_object(source, oid))
+ if (odb_loose_source_has_object(source, oid))
found = 1;
/*
diff --git a/object-file.c b/object-file.c
index d9724e3105f..979aee32de0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -99,8 +99,8 @@ static int check_and_freshen_source(struct odb_source *source,
return check_and_freshen_file(path.buf, freshen);
}
-int has_loose_object(struct odb_source *source,
- const struct object_id *oid)
+int odb_loose_source_has_object(struct odb_source *source,
+ const struct object_id *oid)
{
return check_and_freshen_source(source, oid, 0);
}
@@ -1161,7 +1161,7 @@ int force_object_loose(struct odb_source *source,
int ret;
for (struct odb_source *s = source->odb->sources; s; s = s->next)
- if (has_loose_object(s, oid))
+ if (odb_loose_source_has_object(s, oid))
return 0;
oi.typep = &type;
diff --git a/object-file.h b/object-file.h
index cc723c9baec..8e0f38d413f 100644
--- a/object-file.h
+++ b/object-file.h
@@ -51,6 +51,14 @@ void *odb_loose_source_map_object(struct odb_source *source,
const struct object_id *oid,
unsigned long *size);
+/*
+ * Return true iff an object database source has a loose object
+ * with the specified name. This function does not respect replace
+ * references.
+ */
+int odb_loose_source_has_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -66,14 +74,6 @@ const char *odb_loose_path(struct odb_source *source,
struct strbuf *buf,
const struct object_id *oid);
-/*
- * Return true iff an object database source has a loose object
- * with the specified name. This function does not respect replace
- * references.
- */
-int has_loose_object(struct odb_source *source,
- const struct object_id *oid);
-
/*
* Iterate over the files in the loose-object parts of the object
* directory "path", triggering the following callbacks:
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 11/13] object-file: refactor freshening of objects
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (9 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
` (4 subsequent siblings)
15 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
When writing an object that already exists in our object database we
skip the write and instead only update mtimes of the object, either in
its packed or loose object format. This logic is wholly contained in
"object-file.c", but that file is really only concerned with loose
objects. So it does not really make sense that it also contains the
logic to freshen a packed object.
Introduce a new `odb_freshen_object()` function that sits on the object
database level and two functions `packfile_store_freshen_object()` and
`odb_loose_source_freshen_object()`. Like this, the format-specific
functions can be part of their respective subsystems, while the backend
agnostic function to freshen an object sits at the object database
layer.
Note that this change also moves the logic that iterates through object
sources from the object source layer into the object database layer.
This change is intentional: object sources should ideally only have to
worry about themselves, and coordination of different sources should be
handled on the object database level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 33 +++++----------------------------
object-file.h | 3 +++
odb.c | 16 ++++++++++++++++
odb.h | 3 +++
packfile.c | 16 ++++++++++++++++
packfile.h | 3 +++
6 files changed, 46 insertions(+), 28 deletions(-)
diff --git a/object-file.c b/object-file.c
index 979aee32de0..2ca90adc2c7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -968,30 +968,10 @@ static int write_loose_object(struct odb_source *source,
FOF_SKIP_COLLISION_CHECK);
}
-static int freshen_loose_object(struct object_database *odb,
- const struct object_id *oid)
+int odb_loose_source_freshen_object(struct odb_source *source,
+ const struct object_id *oid)
{
- odb_prepare_alternates(odb);
- for (struct odb_source *source = odb->sources; source; source = source->next)
- if (check_and_freshen_source(source, oid, 1))
- return 1;
- return 0;
-}
-
-static int freshen_packed_object(struct object_database *odb,
- const struct object_id *oid)
-{
- struct pack_entry e;
- if (!find_pack_entry(odb->repo, oid, &e))
- return 0;
- if (e.p->is_cruft)
- return 0;
- if (e.p->freshened)
- return 1;
- if (!freshen_file(e.p->pack_name))
- return 0;
- e.p->freshened = 1;
- return 1;
+ return !!check_and_freshen_source(source, oid, 1);
}
int stream_loose_object(struct odb_source *source,
@@ -1073,12 +1053,10 @@ int stream_loose_object(struct odb_source *source,
die(_("deflateEnd on stream object failed (%d)"), ret);
close_loose_object(source, fd, tmp_file.buf);
- if (freshen_packed_object(source->odb, oid) ||
- freshen_loose_object(source->odb, oid)) {
+ if (odb_freshen_object(source->odb, oid)) {
unlink_or_warn(tmp_file.buf);
goto cleanup;
}
-
odb_loose_path(source, &filename, oid);
/* We finally know the object path, and create the missing dir. */
@@ -1137,8 +1115,7 @@ int write_object_file(struct odb_source *source,
* it out into .git/objects/??/?{38} file.
*/
write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
- if (freshen_packed_object(source->odb, oid) ||
- freshen_loose_object(source->odb, oid))
+ if (odb_freshen_object(source->odb, oid))
return 0;
if (write_loose_object(source, oid, hdr, hdrlen, buf, len, 0, flags))
return -1;
diff --git a/object-file.h b/object-file.h
index 8e0f38d413f..b27c08380d8 100644
--- a/object-file.h
+++ b/object-file.h
@@ -59,6 +59,9 @@ void *odb_loose_source_map_object(struct odb_source *source,
int odb_loose_source_has_object(struct odb_source *source,
const struct object_id *oid);
+int odb_loose_source_freshen_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
diff --git a/odb.c b/odb.c
index 5dc1e2c7eb0..6f8f665351b 100644
--- a/odb.c
+++ b/odb.c
@@ -987,6 +987,22 @@ int odb_has_object(struct object_database *odb, const struct object_id *oid,
return odb_read_object_info_extended(odb, oid, NULL, object_info_flags) >= 0;
}
+int odb_freshen_object(struct object_database *odb,
+ const struct object_id *oid)
+{
+ struct odb_source *source;
+
+ if (packfile_store_freshen_object(odb->packfiles, oid))
+ return 1;
+
+ odb_prepare_alternates(odb);
+ for (source = odb->sources; source; source = source->next)
+ if (odb_loose_source_freshen_object(source, oid))
+ return 1;
+
+ return 0;
+}
+
void odb_assert_oid_type(struct object_database *odb,
const struct object_id *oid, enum object_type expect)
{
diff --git a/odb.h b/odb.h
index 25fbcd7d951..8681b7782b4 100644
--- a/odb.h
+++ b/odb.h
@@ -396,6 +396,9 @@ int odb_has_object(struct object_database *odb,
const struct object_id *oid,
unsigned flags);
+int odb_freshen_object(struct object_database *odb,
+ const struct object_id *oid);
+
void odb_assert_oid_type(struct object_database *odb,
const struct object_id *oid, enum object_type expect);
diff --git a/packfile.c b/packfile.c
index 5a7caec2925..2ab49a0beb1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -819,6 +819,22 @@ struct packed_git *packfile_store_load_pack(struct packfile_store *store,
return p;
}
+int packfile_store_freshen_object(struct packfile_store *store,
+ const struct object_id *oid)
+{
+ struct pack_entry e;
+ if (!find_pack_entry(store->odb->repo, oid, &e))
+ return 0;
+ if (e.p->is_cruft)
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ if (utime(e.p->pack_name, NULL))
+ return 0;
+ e.p->freshened = 1;
+ return 1;
+}
+
void (*report_garbage)(unsigned seen_bits, const char *path);
static void report_helper(const struct string_list *list,
diff --git a/packfile.h b/packfile.h
index e7a5792b6cf..0ad080046f1 100644
--- a/packfile.h
+++ b/packfile.h
@@ -161,6 +161,9 @@ struct list_head *packfile_store_get_packs_mru(struct packfile_store *store);
struct packed_git *packfile_store_load_pack(struct packfile_store *store,
const char *idx_path, int local);
+int packfile_store_freshen_object(struct packfile_store *store,
+ const struct object_id *oid);
+
struct pack_window {
struct pack_window *next;
unsigned char *base;
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 12/13] object-file: rename `write_object_file()`
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (10 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 11/13] object-file: refactor freshening of objects Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
` (3 subsequent siblings)
15 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
Rename `write_object_file()` to `odb_loose_source_write_object()` so
that it becomes clear that this is tied to a specific loose object
source. This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 8 ++++----
object-file.h | 10 +++++-----
odb.c | 3 ++-
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/object-file.c b/object-file.c
index 2ca90adc2c7..67be5371346 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,10 +1084,10 @@ int stream_loose_object(struct odb_source *source,
return err;
}
-int write_object_file(struct odb_source *source,
- const void *buf, unsigned long len,
- enum object_type type, struct object_id *oid,
- struct object_id *compat_oid_in, unsigned flags)
+int odb_loose_source_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags)
{
const struct git_hash_algo *algo = source->odb->repo->hash_algo;
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
diff --git a/object-file.h b/object-file.h
index b27c08380d8..78f0e650d72 100644
--- a/object-file.h
+++ b/object-file.h
@@ -62,6 +62,11 @@ int odb_loose_source_has_object(struct odb_source *source,
int odb_loose_source_freshen_object(struct odb_source *source,
const struct object_id *oid);
+int odb_loose_source_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -168,11 +173,6 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
-int write_object_file(struct odb_source *source,
- const void *buf, unsigned long len,
- enum object_type type, struct object_id *oid,
- struct object_id *compat_oid_in, unsigned flags);
-
struct input_stream {
const void *(*read)(struct input_stream *, unsigned long *len);
void *data;
diff --git a/odb.c b/odb.c
index 6f8f665351b..432011b4dac 100644
--- a/odb.c
+++ b/odb.c
@@ -1021,7 +1021,8 @@ int odb_write_object_ext(struct object_database *odb,
struct object_id *compat_oid,
unsigned flags)
{
- return write_object_file(odb->sources, buf, len, type, oid, compat_oid, flags);
+ return odb_loose_source_write_object(odb->sources, buf, len, type,
+ oid, compat_oid, flags);
}
struct object_database *odb_new(struct repository *repo)
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 13/13] object-file: refactor writing objects via a stream
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (11 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
@ 2025-10-24 9:56 ` Patrick Steinhardt
2025-10-30 12:24 ` [PATCH 00/13] Carve out loose object source Karthik Nayak
` (2 subsequent siblings)
15 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-24 9:56 UTC (permalink / raw)
To: git
We have two different ways to write an object into the database:
- We either provide the full buffer and write the object all at once.
- Or we provide an input stream that has a `read()` function so that
we can chunk the object.
The latter is especially used for large objects, where it may be too
expensive to hold the complete object in memory all at once.
While we already have `odb_write_object()` at the ODB-layer, we don't
have an equivalent for streaming an object. Introduce a new function
`odb_write_object_stream()` to address this gap so that callers don't
have to be aware of the inner workings of how to stream an object to
disk with a specific object source.
Rename `stream_loose_object()` to `odb_loose_source_write_stream()` to
clarify its scope. This matches our modern best practices around how to
name functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/unpack-objects.c | 7 +++----
object-file.c | 6 +++---
object-file.h | 14 ++++----------
odb.c | 7 +++++++
odb.h | 10 ++++++++++
5 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index ef79e43715d..6fc64e9e4b8 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -363,7 +363,7 @@ struct input_zstream_data {
int status;
};
-static const void *feed_input_zstream(struct input_stream *in_stream,
+static const void *feed_input_zstream(struct odb_write_stream *in_stream,
unsigned long *readlen)
{
struct input_zstream_data *data = in_stream->data;
@@ -393,7 +393,7 @@ static void stream_blob(unsigned long size, unsigned nr)
{
git_zstream zstream = { 0 };
struct input_zstream_data data = { 0 };
- struct input_stream in_stream = {
+ struct odb_write_stream in_stream = {
.read = feed_input_zstream,
.data = &data,
};
@@ -402,8 +402,7 @@ static void stream_blob(unsigned long size, unsigned nr)
data.zstream = &zstream;
git_inflate_init(&zstream);
- if (stream_loose_object(the_repository->objects->sources,
- &in_stream, size, &info->oid))
+ if (odb_write_object_stream(the_repository->objects, &in_stream, size, &info->oid))
die(_("failed to write object in stream"));
if (data.status != Z_STREAM_END)
diff --git a/object-file.c b/object-file.c
index 67be5371346..967284c9ee5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -974,9 +974,9 @@ int odb_loose_source_freshen_object(struct odb_source *source,
return !!check_and_freshen_source(source, oid, 1);
}
-int stream_loose_object(struct odb_source *source,
- struct input_stream *in_stream, size_t len,
- struct object_id *oid)
+int odb_loose_source_write_stream(struct odb_source *source,
+ struct odb_write_stream *in_stream, size_t len,
+ struct object_id *oid)
{
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
struct object_id compat_oid;
diff --git a/object-file.h b/object-file.h
index 78f0e650d72..905b0f6c9bb 100644
--- a/object-file.h
+++ b/object-file.h
@@ -67,6 +67,10 @@ int odb_loose_source_write_object(struct odb_source *source,
enum object_type type, struct object_id *oid,
struct object_id *compat_oid_in, unsigned flags);
+int odb_loose_source_write_stream(struct odb_source *source,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -173,16 +177,6 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
-struct input_stream {
- const void *(*read)(struct input_stream *, unsigned long *len);
- void *data;
- int is_finished;
-};
-
-int stream_loose_object(struct odb_source *source,
- struct input_stream *in_stream, size_t len,
- struct object_id *oid);
-
int force_object_loose(struct odb_source *source,
const struct object_id *oid, time_t mtime);
diff --git a/odb.c b/odb.c
index 432011b4dac..62d65f71a6d 100644
--- a/odb.c
+++ b/odb.c
@@ -1025,6 +1025,13 @@ int odb_write_object_ext(struct object_database *odb,
oid, compat_oid, flags);
}
+int odb_write_object_stream(struct object_database *odb,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid)
+{
+ return odb_loose_source_write_stream(odb->sources, stream, len, oid);
+}
+
struct object_database *odb_new(struct repository *repo)
{
struct object_database *o = xmalloc(sizeof(*o));
diff --git a/odb.h b/odb.h
index 8681b7782b4..92df20417b9 100644
--- a/odb.h
+++ b/odb.h
@@ -492,4 +492,14 @@ static inline int odb_write_object(struct object_database *odb,
return odb_write_object_ext(odb, buf, len, type, oid, NULL, 0);
}
+struct odb_write_stream {
+ const void *(*read)(struct odb_write_stream *, unsigned long *len);
+ void *data;
+ int is_finished;
+};
+
+int odb_write_object_stream(struct object_database *odb,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid);
+
#endif /* ODB_H */
--
2.51.1.930.gacf6e81ea2.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable
2025-10-24 9:56 ` [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
@ 2025-10-24 16:21 ` Junio C Hamano
2025-10-30 10:34 ` Karthik Nayak
1 sibling, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2025-10-24 16:21 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
Patrick Steinhardt <ps@pks.im> writes:
> +static int alt_odb_usable(struct object_database *o, const char *path,
> + const char *normalized_objdir)
> {
> int r;
>
> /* Detect cases where alternate disappeared */
> - if (!is_directory(path->buf)) {
> + if (!is_directory(path)) {
> error(_("object directory %s does not exist; "
> "check .git/objects/info/alternates"),
> - path->buf);
> + path);
> return 0;
> }
>
> @@ -113,11 +112,14 @@ static int alt_odb_usable(struct object_database *o,
> assert(r == 1); /* never used */
> kh_value(o->source_by_path, p) = o->sources;
> }
> - if (fspatheq(path->buf, normalized_objdir))
> +
> + if (fspatheq(path, normalized_objdir))
> + return 0;
> +
> + if (kh_get_odb_path_map(o->source_by_path, path) < kh_end(o->source_by_path))
> return 0;
> - *pos = kh_put_odb_path_map(o->source_by_path, path->buf, &r);
> - /* r: 0 = exists, 1 = never used, 2 = deleted */
> - return r == 0 ? 0 : 1;
> +
> + return 1;
> }
In other words, when we say "we ask if this is usable", it says "yes
it is usable" or "no it is not", but ...
> @@ -148,6 +150,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
> struct strbuf pathbuf = STRBUF_INIT;
> struct strbuf tmp = STRBUF_INIT;
> khiter_t pos;
> + int ret;
>
> if (!is_absolute_path(dir) && relative_base) {
> strbuf_realpath(&pathbuf, relative_base, 1);
> @@ -172,20 +175,21 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
> strbuf_reset(&tmp);
> strbuf_realpath(&tmp, odb->sources->path, 1);
>
> - if (!alt_odb_usable(odb, &pathbuf, tmp.buf, &pos))
> + if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
> goto error;
>
> CALLOC_ARRAY(alternate, 1);
> alternate->odb = odb;
> alternate->local = false;
> - /* pathbuf.buf is already in r->objects->source_by_path */
> alternate->path = strbuf_detach(&pathbuf, NULL);
>
> /* add the alternate entry */
> *odb->sources_tail = alternate;
> odb->sources_tail = &(alternate->next);
> - alternate->next = NULL;
> - assert(odb->source_by_path);
> +
> + pos = kh_put_odb_path_map(odb->source_by_path, alternate->path, &ret);
> + if (!ret)
> + BUG("source must not yet exist");
... responding to the answer and saying "ok, if it is usable, then
let's use it" is a responsibility of the caller. Which makes sense.
> kh_value(odb->source_by_path, pos) = alternate;
>
> /* recursively add alternates */
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 02/13] odb: introduce `odb_source_new()`
2025-10-24 9:56 ` [PATCH 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
@ 2025-10-24 16:37 ` Junio C Hamano
2025-10-27 11:21 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2025-10-24 16:37 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
Patrick Steinhardt <ps@pks.im> writes:
> diff --git a/repository.c b/repository.c
> index 6faf5c73981..6aaa7ba0086 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -160,20 +160,24 @@ void repo_set_gitdir(struct repository *repo,
> * until after xstrdup(root). Then we can free it.
> */
> char *old_gitdir = repo->gitdir;
> + char *objects_path = NULL;
>
> repo->gitdir = xstrdup(gitfile ? gitfile : root);
> free(old_gitdir);
>
> repo_set_commondir(repo, o->commondir);
> + expand_base_dir(&objects_path, o->object_dir,
> + repo->commondir, "objects");
>
> if (!repo->objects->sources) {
> - CALLOC_ARRAY(repo->objects->sources, 1);
> - repo->objects->sources->odb = repo->objects;
> - repo->objects->sources->local = true;
> + repo->objects->sources = odb_source_new(repo->objects,
> + objects_path, true);
> repo->objects->sources_tail = &repo->objects->sources->next;
> + free(objects_path);
> + } else {
> + free(repo->objects->sources->path);
> + repo->objects->sources->path = objects_path;
> }
> - expand_base_dir(&repo->objects->sources->path, o->object_dir,
> - repo->commondir, "objects");
>
> repo->objects->sources->disable_ref_updates = o->disable_ref_updates;
Everything else is straight-forward, but this one is trickier. When
the .sources structure already exists, we used to just change the
path in it, but now because creation of .sources structure requires
path to be computed beforehand, we need an "else" clause to handle
the "when we already had .sources when we enter this section of
code" case.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 06/13] object-file: move loose object cache into loose source
2025-10-24 9:56 ` [PATCH 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
@ 2025-10-24 21:44 ` Junio C Hamano
2025-10-27 11:21 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2025-10-24 21:44 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
Patrick Steinhardt <ps@pks.im> writes:
> @@ -2006,5 +2006,8 @@ struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
>
> void odb_loose_source_free(struct odb_loose_source *source)
> {
> + if (!source)
> + return;
> + odb_clear_loose_cache(source->source);
> free(source);
> }
This had me confused, especially the source->source part. Perhaps
call the parameter "loose" so loose->source is the way for somebody
who has a odb_loose_source to learn what odb_source it belongs to,
or something. Of course the round-about way to clera the cache that
now belongs to odb_loose_source by taking odb_source looked awkward
in this step, but that awkwardness goes away immediately in the next
step. And the parameter of type "odb_loose_source *" called "source"
here, instead of "loose", still hurts after the next step [07/13] is
applied.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 02/13] odb: introduce `odb_source_new()`
2025-10-24 16:37 ` Junio C Hamano
@ 2025-10-27 11:21 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-27 11:21 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Fri, Oct 24, 2025 at 09:37:39AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > diff --git a/repository.c b/repository.c
> > index 6faf5c73981..6aaa7ba0086 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -160,20 +160,24 @@ void repo_set_gitdir(struct repository *repo,
> > * until after xstrdup(root). Then we can free it.
> > */
> > char *old_gitdir = repo->gitdir;
> > + char *objects_path = NULL;
> >
> > repo->gitdir = xstrdup(gitfile ? gitfile : root);
> > free(old_gitdir);
> >
> > repo_set_commondir(repo, o->commondir);
> > + expand_base_dir(&objects_path, o->object_dir,
> > + repo->commondir, "objects");
> >
> > if (!repo->objects->sources) {
> > - CALLOC_ARRAY(repo->objects->sources, 1);
> > - repo->objects->sources->odb = repo->objects;
> > - repo->objects->sources->local = true;
> > + repo->objects->sources = odb_source_new(repo->objects,
> > + objects_path, true);
> > repo->objects->sources_tail = &repo->objects->sources->next;
> > + free(objects_path);
> > + } else {
> > + free(repo->objects->sources->path);
> > + repo->objects->sources->path = objects_path;
> > }
> > - expand_base_dir(&repo->objects->sources->path, o->object_dir,
> > - repo->commondir, "objects");
> >
> > repo->objects->sources->disable_ref_updates = o->disable_ref_updates;
>
> Everything else is straight-forward, but this one is trickier. When
> the .sources structure already exists, we used to just change the
> path in it, but now because creation of .sources structure requires
> path to be computed beforehand, we need an "else" clause to handle
> the "when we already had .sources when we enter this section of
> code" case.
Yeah. I plan on revamping this area so that we create the primary object
source only after we have figured out all details about the repository.
But for now we'll continue to reach into the already allocated source
and replace its path.
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 06/13] object-file: move loose object cache into loose source
2025-10-24 21:44 ` Junio C Hamano
@ 2025-10-27 11:21 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-27 11:21 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Fri, Oct 24, 2025 at 02:44:49PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > @@ -2006,5 +2006,8 @@ struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
> >
> > void odb_loose_source_free(struct odb_loose_source *source)
> > {
> > + if (!source)
> > + return;
> > + odb_clear_loose_cache(source->source);
> > free(source);
> > }
>
> This had me confused, especially the source->source part. Perhaps
> call the parameter "loose" so loose->source is the way for somebody
> who has a odb_loose_source to learn what odb_source it belongs to,
> or something. Of course the round-about way to clera the cache that
> now belongs to odb_loose_source by taking odb_source looked awkward
> in this step, but that awkwardness goes away immediately in the next
> step. And the parameter of type "odb_loose_source *" called "source"
> here, instead of "loose", still hurts after the next step [07/13] is
> applied.
That's fair. Will rename accordingly, thanks!
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable
2025-10-24 9:56 ` [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
2025-10-24 16:21 ` Junio C Hamano
@ 2025-10-30 10:34 ` Karthik Nayak
1 sibling, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2025-10-30 10:34 UTC (permalink / raw)
To: Patrick Steinhardt, git
[-- Attachment #1: Type: text/plain, Size: 4208 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> When adding an alternate to the object database we first check whether
> or not the path is usable. A path is usable if:
>
> - It actually exists.
>
> - We don't have it in our object sources yet.
>
> While the former check is trivial enough, the latter part is somewhat
> subtle and prone for bugs. This is because the function doesn't only
> check whether or not the given path is usable. But if it _is_ usable, we
> also store that path in the map of object sources immediately.
>
> The tricky part here is that the path that gets stored in the map is
> _not_ copied. Instead, we rely on the fact that subsequent code uses
> `strbuf_detach()` to store the exact same allocated memory in the
> created object source. Consequently, the memory is owned by the source
> but _also_ stored in the map. This subtlety is easy to miss, so if one
> decides to refactor this code one can easily end up breaking this
> mechanism.
>
> Make the relationship more explicit by not storing the path as part of
> `alt_odb_usable()`. Instead, we store the path after we have created the
> source now so that we can use the source's path pointer directly.
>
Nit: The last sentence would read a little better with s/we// and s/now//
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> odb.c | 30 +++++++++++++++++-------------
> 1 file changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/odb.c b/odb.c
> index 00a6e71568b..57d85ed9505 100644
> --- a/odb.c
> +++ b/odb.c
> @@ -86,17 +86,16 @@ int odb_mkstemp(struct object_database *odb,
> /*
> * Return non-zero iff the path is usable as an alternate object database.
> */
> -static int alt_odb_usable(struct object_database *o,
> - struct strbuf *path,
> - const char *normalized_objdir, khiter_t *pos)
> +static int alt_odb_usable(struct object_database *o, const char *path,
> + const char *normalized_objdir)
> {
> int r;
>
> /* Detect cases where alternate disappeared */
> - if (!is_directory(path->buf)) {
> + if (!is_directory(path)) {
> error(_("object directory %s does not exist; "
> "check .git/objects/info/alternates"),
> - path->buf);
> + path);
> return 0;
> }
>
> @@ -113,11 +112,14 @@ static int alt_odb_usable(struct object_database *o,
> assert(r == 1); /* never used */
> kh_value(o->source_by_path, p) = o->sources;
> }
> - if (fspatheq(path->buf, normalized_objdir))
> +
> + if (fspatheq(path, normalized_objdir))
> + return 0;
> +
> + if (kh_get_odb_path_map(o->source_by_path, path) < kh_end(o->source_by_path))
> return 0;
> - *pos = kh_put_odb_path_map(o->source_by_path, path->buf, &r);
> - /* r: 0 = exists, 1 = never used, 2 = deleted */
> - return r == 0 ? 0 : 1;
> +
> + return 1;
> }
>
Okay as part of `alt_obd_usable()` we no longer add the path to the list
of alternates.
> /*
> @@ -148,6 +150,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
> struct strbuf pathbuf = STRBUF_INIT;
> struct strbuf tmp = STRBUF_INIT;
> khiter_t pos;
> + int ret;
>
> if (!is_absolute_path(dir) && relative_base) {
> strbuf_realpath(&pathbuf, relative_base, 1);
> @@ -172,20 +175,21 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
> strbuf_reset(&tmp);
> strbuf_realpath(&tmp, odb->sources->path, 1);
>
> - if (!alt_odb_usable(odb, &pathbuf, tmp.buf, &pos))
> + if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
> goto error;
>
> CALLOC_ARRAY(alternate, 1);
> alternate->odb = odb;
> alternate->local = false;
> - /* pathbuf.buf is already in r->objects->source_by_path */
> alternate->path = strbuf_detach(&pathbuf, NULL);
>
> /* add the alternate entry */
> *odb->sources_tail = alternate;
> odb->sources_tail = &(alternate->next);
> - alternate->next = NULL;
> - assert(odb->source_by_path);
> +
> + pos = kh_put_odb_path_map(odb->source_by_path, alternate->path, &ret);
> + if (!ret)
> + BUG("source must not yet exist");
> kh_value(odb->source_by_path, pos) = alternate;
>
> /* recursively add alternates */
>
Instead we now add it as part of `link_alt_obd_entry()`. This makes a
lot more sense.
> --
> 2.51.1.930.gacf6e81ea2.dirty
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 03/13] odb: adjust naming to free object sources
2025-10-24 9:56 ` [PATCH 03/13] odb: adjust naming to free object sources Patrick Steinhardt
@ 2025-10-30 10:41 ` Karthik Nayak
0 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2025-10-30 10:41 UTC (permalink / raw)
To: Patrick Steinhardt, git
[-- Attachment #1: Type: text/plain, Size: 770 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> The functions `free_object_directory()` and `free_object_directories()`
> are responsible for freeing a single object source or all object sources
> connected to an object database, respectively. The associated structure
> has been renamed from `struct object_directory` to `struct odb_source`
> recently though, so the names are somewhat stale nowadays.
>
> Rename them to mention the new struct name instead. Furthermore, while
> at it, adapt them to our modern naming schema where we first have the
> subject followed by a verb.
>
Nit: Would be nice to mention that he renaming was done as part of
a1e2581a1e (object-store: rename `object_directory` to `odb_source`,
2025-07-01).
The patch looks straightforward :)
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 05/13] object-file: introduce `struct odb_loose_source`
2025-10-24 9:56 ` [PATCH 05/13] object-file: introduce `struct odb_loose_source` Patrick Steinhardt
@ 2025-10-30 10:47 ` Karthik Nayak
2025-10-30 11:32 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Karthik Nayak @ 2025-10-30 10:47 UTC (permalink / raw)
To: Patrick Steinhardt, git
[-- Attachment #1: Type: text/plain, Size: 603 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> Currently, all state that relates to loose objects is held directly by
> the `struct odb_source`. Introduce a new `struct odb_loose_source` to
> hold the state instead so that it is entirely self-contained.
>
I wonder if the naming should instead be `struct obd_source_loose` that
way other backends (if added) would be something like:
struct obd_source_loose
struct obd_source_postgres
struct obd_source_mongo
This is easier to read and also for autocompletion it leads nicely into
the 'obd_source_...' namespace.
The patch looks good.
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 05/13] object-file: introduce `struct odb_loose_source`
2025-10-30 10:47 ` Karthik Nayak
@ 2025-10-30 11:32 ` Patrick Steinhardt
2025-10-31 6:11 ` Patrick Steinhardt
2025-10-31 16:10 ` Junio C Hamano
0 siblings, 2 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-30 11:32 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Thu, Oct 30, 2025 at 03:47:58AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > Currently, all state that relates to loose objects is held directly by
> > the `struct odb_source`. Introduce a new `struct odb_loose_source` to
> > hold the state instead so that it is entirely self-contained.
> >
>
> I wonder if the naming should instead be `struct obd_source_loose` that
> way other backends (if added) would be something like:
>
> struct obd_source_loose
> struct obd_source_postgres
> struct obd_source_mongo
>
> This is easier to read and also for autocompletion it leads nicely into
> the 'obd_source_...' namespace.
Hm, I see your point. I think that "loose source" flows a bit more
natural, but I agree that the above is more accessible in code.
Before I change this: does anybody else have an opinion here?
Thanks!
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 09/13] object-file: read objects via the loose object source
2025-10-24 9:56 ` [PATCH 09/13] object-file: read objects via the loose object source Patrick Steinhardt
@ 2025-10-30 12:19 ` Karthik Nayak
0 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2025-10-30 12:19 UTC (permalink / raw)
To: Patrick Steinhardt, git
[-- Attachment #1: Type: text/plain, Size: 6157 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> When reading an object via `loose_object_info()` or `map_loose_object()`
> we hand in the whole repository. We then iterate through each of the
> object sources to figure out whether that source has the object in
> question.
>
Okay so earlier these two functions would call `open_loose_object()` and
that function would iterate over all object sources like you mentioned.
> This logic is reversing responsibility though: a specific backend should
> only care about one specific source, where the object sources themselves
> are then managed by the object database.
>
> Refactor the code accordingly by passing an object source to both of
> these functions instead. The different sources are then handled by
> either `do_oid_object_info_extended()`, which sits on the object
> database level, and by `open_istream_loose()`. The latter function
> arguably is still at the wrong level, but this will be cleaned up at a
> later point in time.
And we move this check to a layer above, which can check if any of the
sources contain the object.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> object-file.c | 68 ++++++++++++++++++++++-------------------------------------
> object-file.h | 15 +++++++------
> odb.c | 9 ++++++--
> streaming.c | 11 +++++++++-
> 4 files changed, 50 insertions(+), 53 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index 14daa2bdd90..d9724e3105f 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -167,25 +167,22 @@ int stream_object_signature(struct repository *r, const struct object_id *oid)
> }
>
> /*
> - * Find "oid" as a loose object in the local repository or in an alternate.
> + * Find "oid" as a loose object in given source.
> * Returns 0 on success, negative on failure.
> *
> * The "path" out-parameter will give the path of the object we found (if any).
> * Note that it may point to static storage and is only valid until another
> * call to stat_loose_object().
> */
> -static int stat_loose_object(struct repository *r, const struct object_id *oid,
> +static int stat_loose_object(struct odb_loose_source *source,
> + const struct object_id *oid,
> struct stat *st, const char **path)
> {
> - struct odb_source *source;
> static struct strbuf buf = STRBUF_INIT;
>
> - odb_prepare_alternates(r->objects);
> - for (source = r->objects->sources; source; source = source->next) {
> - *path = odb_loose_path(source, &buf, oid);
> - if (!lstat(*path, st))
> - return 0;
> - }
> + *path = odb_loose_path(source->source, &buf, oid);
> + if (!lstat(*path, st))
> + return 0;
>
> return -1;
> }
> @@ -194,39 +191,24 @@ static int stat_loose_object(struct repository *r, const struct object_id *oid,
> * Like stat_loose_object(), but actually open the object and return the
> * descriptor. See the caveats on the "path" parameter above.
> */
> -static int open_loose_object(struct repository *r,
> +static int open_loose_object(struct odb_loose_source *source,
> const struct object_id *oid, const char **path)
> {
> - int fd;
> - struct odb_source *source;
> - int most_interesting_errno = ENOENT;
> static struct strbuf buf = STRBUF_INIT;
> + int fd;
>
> - odb_prepare_alternates(r->objects);
> - for (source = r->objects->sources; source; source = source->next) {
> - *path = odb_loose_path(source, &buf, oid);
> - fd = git_open(*path);
> - if (fd >= 0)
> - return fd;
> + *path = odb_loose_path(source->source, &buf, oid);
> + fd = git_open(*path);
> + if (fd >= 0)
> + return fd;
>
> - if (most_interesting_errno == ENOENT)
> - most_interesting_errno = errno;
> - }
> - errno = most_interesting_errno;
> return -1;
> }
>
> -static int quick_has_loose(struct repository *r,
> +static int quick_has_loose(struct odb_loose_source *source,
> const struct object_id *oid)
> {
> - struct odb_source *source;
> -
> - odb_prepare_alternates(r->objects);
> - for (source = r->objects->sources; source; source = source->next) {
> - if (oidtree_contains(odb_loose_source_cache(source, oid), oid))
> - return 1;
> - }
> - return 0;
> + return !!oidtree_contains(odb_loose_source_cache(source->source, oid), oid);
> }
>
These above functions are much simpler now that they have to only worry
about the source level and not the repository level.
[snip]
> diff --git a/odb.c b/odb.c
> index 96059456f20..5dc1e2c7eb0 100644
> --- a/odb.c
> +++ b/odb.c
> @@ -697,13 +697,18 @@ static int do_oid_object_info_extended(struct object_database *odb,
> return 0;
> }
>
> + odb_prepare_alternates(odb);
> +
> while (1) {
> + struct odb_source *source;
> +
> if (find_pack_entry(odb->repo, real, &e))
> break;
>
> /* Most likely it's a loose object. */
> - if (!loose_object_info(odb->repo, real, oi, flags))
> - return 0;
> + for (source = odb->sources; source; source = source->next)
> + if (!odb_loose_source_read_object_info(source, real, oi, flags))
> + return 0;
>
The iteration over all sources is now moved here, makes sense.
> /* Not a loose object; someone else may have just packed it. */
> if (!(flags & OBJECT_INFO_QUICK)) {
> diff --git a/streaming.c b/streaming.c
> index 4b13827668e..8e554abd084 100644
> --- a/streaming.c
> +++ b/streaming.c
> @@ -230,12 +230,21 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
> enum object_type *type)
> {
> struct object_info oi = OBJECT_INFO_INIT;
> + struct odb_source *source;
> +
> oi.sizep = &st->size;
> oi.typep = type;
>
> - st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
> + odb_prepare_alternates(r->objects);
> + for (source = r->objects->sources; source; source = source->next) {
> + st->u.loose.mapped = odb_loose_source_map_object(source, oid,
> + &st->u.loose.mapsize);
> + if (st->u.loose.mapped)
> + break;
> + }
> if (!st->u.loose.mapped)
> return -1;
> +
> switch (unpack_loose_header(&st->z, st->u.loose.mapped,
> st->u.loose.mapsize, st->u.loose.hdr,
> sizeof(st->u.loose.hdr))) {
>
> --
> 2.51.1.930.gacf6e81ea2.dirty
Same here. The patch looks good.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 00/13] Carve out loose object source
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (12 preceding siblings ...)
2025-10-24 9:56 ` [PATCH 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
@ 2025-10-30 12:24 ` Karthik Nayak
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
15 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2025-10-30 12:24 UTC (permalink / raw)
To: Patrick Steinhardt, git
[-- Attachment #1: Type: text/plain, Size: 863 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this patch series carves out loose object sources. The idea is to store
> all data that relates to loose objects in a single structure, similar to
> our `struct packfile_store`.
>
> The series is structured as follows:
>
> - Patches 1 to 4 perform some cleanups in the vicinity of object
> databases.
>
> - Patches 5 to 8 create a new `struct odb_loose_source` and move all
> state that is specific to loose objects into it.
>
> - Patches 9 to 13 then adjust functions to work on top of that new
> structure.
>
> The motivation here is to make handling of loose objects completely
> self-contained as a step towards pluggable object databases.
>
> Thanks!
>
> Patrick
>
The series looks good in its current form, I only had one comment. Will
leave it to if it needs to be re-rolled :)
Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 05/13] object-file: introduce `struct odb_loose_source`
2025-10-30 11:32 ` Patrick Steinhardt
@ 2025-10-31 6:11 ` Patrick Steinhardt
2025-10-31 16:16 ` Junio C Hamano
2025-10-31 16:10 ` Junio C Hamano
1 sibling, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:11 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Thu, Oct 30, 2025 at 12:32:21PM +0100, Patrick Steinhardt wrote:
> On Thu, Oct 30, 2025 at 03:47:58AM -0700, Karthik Nayak wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> > > Currently, all state that relates to loose objects is held directly by
> > > the `struct odb_source`. Introduce a new `struct odb_loose_source` to
> > > hold the state instead so that it is entirely self-contained.
> > >
> >
> > I wonder if the naming should instead be `struct obd_source_loose` that
> > way other backends (if added) would be something like:
> >
> > struct obd_source_loose
> > struct obd_source_postgres
> > struct obd_source_mongo
> >
> > This is easier to read and also for autocompletion it leads nicely into
> > the 'obd_source_...' namespace.
>
> Hm, I see your point. I think that "loose source" flows a bit more
> natural, but I agree that the above is more accessible in code.
>
> Before I change this: does anybody else have an opinion here?
I think for now I'll stick to the current naming. This is due to two
reasons:
- As said, I think this flows more naturally in language. When talking
about this you'll say "I'm using the files source" or "I'm using the
whatever source".
- It somewhat matches the naming we have in the reference backends,
where we have `struct reftable_backend` and `struct files_backend`.
That being said I don't feel very strong about this. So I'll post v2
with the current naming, but if either you or somebody else feels
strongly that this should be adjusted I'm happy to adapt.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2 00/13] Carve out loose object source
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (13 preceding siblings ...)
2025-10-30 12:24 ` [PATCH 00/13] Carve out loose object source Karthik Nayak
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
` (12 more replies)
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
15 siblings, 13 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Hi,
this patch series carves out loose object sources. The idea is to store
all data that relates to loose objects in a single structure, similar to
our `struct packfile_store`.
The series is structured as follows:
- Patches 1 to 4 perform some cleanups in the vicinity of object
databases.
- Patches 5 to 8 create a new `struct odb_loose_source` and move all
state that is specific to loose objects into it.
- Patches 9 to 13 then adjust functions to work on top of that new
structure.
The motivation here is to make handling of loose objects completely
self-contained as a step towards pluggable object databases.
Changes in v2:
- Rename `sturct odb_loose_source *` arguments from `source` to
`loose`.
- Some commit message improvements.
- Link to v1: https://lore.kernel.org/r/20251024-b4-pks-odb-loose-backend-v1-0-1a4202273c38@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (13):
odb: fix subtle logic to check whether an alternate is usable
odb: introduce `odb_source_new()`
odb: adjust naming to free object sources
object-file: move `fetch_if_missing`
object-file: introduce `struct odb_loose_source`
object-file: move loose object cache into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object map into loose source
object-file: read objects via the loose object source
object-file: rename `has_loose_object()`
object-file: refactor freshening of objects
object-file: rename `write_object_file()`
object-file: refactor writing objects via a stream
builtin/pack-objects.c | 4 +-
builtin/unpack-objects.c | 7 +-
loose.c | 19 ++---
object-file.c | 175 +++++++++++++++++++++--------------------------
object-file.h | 98 ++++++++++++++------------
object-name.c | 2 +-
odb.c | 104 +++++++++++++++++++---------
odb.h | 41 +++++++----
packfile.c | 16 +++++
packfile.h | 3 +
repository.c | 14 ++--
streaming.c | 11 ++-
12 files changed, 287 insertions(+), 207 deletions(-)
Range-diff versus v1:
1: b43ef34ba3d ! 1: f0bd0c69c8c odb: fix subtle logic to check whether an alternate is usable
@@ Commit message
mechanism.
Make the relationship more explicit by not storing the path as part of
- `alt_odb_usable()`. Instead, we store the path after we have created the
- source now so that we can use the source's path pointer directly.
+ `alt_odb_usable()`. Instead, store the path after we have created the
+ source so that we can use the source's path pointer directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
2: a56c17be445 = 2: b11d557b69d odb: introduce `odb_source_new()`
3: 8d5d4f98d0c ! 3: eab62039169 odb: adjust naming to free object sources
@@ Commit message
are responsible for freeing a single object source or all object sources
connected to an object database, respectively. The associated structure
has been renamed from `struct object_directory` to `struct odb_source`
- recently though, so the names are somewhat stale nowadays.
+ in a1e2581a1e (object-store: rename `object_directory` to `odb_source`,
+ 2025-07-01) though, so the names are somewhat stale nowadays.
Rename them to mention the new struct name instead. Furthermore, while
at it, adapt them to our modern naming schema where we first have the
4: c2e85285be2 = 4: 6afdd2679d8 object-file: move `fetch_if_missing`
5: 3a532fcac19 ! 5: 7de78e1e32b object-file: introduce `struct odb_loose_source`
@@ object-file.c: void object_file_transaction_commit(struct odb_transaction *trans
+ return loose;
+}
+
-+void odb_loose_source_free(struct odb_loose_source *source)
++void odb_loose_source_free(struct odb_loose_source *loose)
+{
-+ free(source);
++ free(loose);
+}
## object-file.h ##
@@ object-file.h: int index_path(struct index_state *istate, struct object_id *oid,
+};
+
+struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
-+void odb_loose_source_free(struct odb_loose_source *source);
++void odb_loose_source_free(struct odb_loose_source *loose);
+
/*
* Populate and return the loose object cache array corresponding to the
6: 4e15e659e25 ! 6: 3d14e9c3b66 object-file: move loose object cache into loose source
@@ object-file.c: static int append_loose_object(const struct object_id *oid,
static int check_stream_oid(git_zstream *stream,
@@ object-file.c: struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
- void odb_loose_source_free(struct odb_loose_source *source)
+ void odb_loose_source_free(struct odb_loose_source *loose)
{
-+ if (!source)
++ if (!loose)
+ return;
-+ odb_clear_loose_cache(source->source);
- free(source);
++ odb_clear_loose_cache(loose->source);
+ free(loose);
}
## object-file.h ##
@@ object-file.h: struct odb_source;
};
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
-@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *source);
+@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *loose);
* Populate and return the loose object cache array corresponding to the
* given object ID.
*/
7: 59d08a2b560 ! 7: 2167b592c03 object-file: hide internals when we need to reprepare loose sources
@@ object-file.c: struct oidtree *odb_loose_source_cache(struct odb_source *source,
}
-void odb_clear_loose_cache(struct odb_source *source)
-+static void odb_loose_source_clear_cache(struct odb_loose_source *source)
++static void odb_loose_source_clear_cache(struct odb_loose_source *loose)
{
- oidtree_clear(source->loose->cache);
- FREE_AND_NULL(source->loose->cache);
- memset(&source->loose->subdir_seen, 0,
- sizeof(source->loose->subdir_seen));
-+ oidtree_clear(source->cache);
-+ FREE_AND_NULL(source->cache);
-+ memset(&source->subdir_seen, 0,
-+ sizeof(source->subdir_seen));
++ oidtree_clear(loose->cache);
++ FREE_AND_NULL(loose->cache);
++ memset(&loose->subdir_seen, 0,
++ sizeof(loose->subdir_seen));
+}
+
+void odb_loose_source_reprepare(struct odb_source *source)
@@ object-file.c: struct oidtree *odb_loose_source_cache(struct odb_source *source,
}
static int check_stream_oid(git_zstream *stream,
-@@ object-file.c: void odb_loose_source_free(struct odb_loose_source *source)
+@@ object-file.c: void odb_loose_source_free(struct odb_loose_source *loose)
{
- if (!source)
+ if (!loose)
return;
-- odb_clear_loose_cache(source->source);
-+ odb_loose_source_clear_cache(source);
- free(source);
+- odb_clear_loose_cache(loose->source);
++ odb_loose_source_clear_cache(loose);
+ free(loose);
}
## object-file.h ##
@@ object-file.h: struct odb_loose_source {
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
- void odb_loose_source_free(struct odb_loose_source *source);
+ void odb_loose_source_free(struct odb_loose_source *loose);
+/* Reprepare the loose source by emptying the loose object cache. */
+void odb_loose_source_reprepare(struct odb_source *source);
@@ object-file.h: struct odb_loose_source {
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
-@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *source);
+@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *loose);
struct oidtree *odb_loose_source_cache(struct odb_source *source,
const struct object_id *oid);
8: 8c04cdc54a0 ! 8: a9a1637a863 object-file: move loose object map into loose source
@@ loose.c: int repo_loose_object_map_oid(struct repository *repo,
map = (to == repo->compat_hash_algo) ?
## object-file.c ##
-@@ object-file.c: void odb_loose_source_free(struct odb_loose_source *source)
- if (!source)
+@@ object-file.c: void odb_loose_source_free(struct odb_loose_source *loose)
+ if (!loose)
return;
- odb_loose_source_clear_cache(source);
-+ loose_object_map_clear(&source->map);
- free(source);
+ odb_loose_source_clear_cache(loose);
++ loose_object_map_clear(&loose->map);
+ free(loose);
}
## object-file.h ##
9: 8847142a317 ! 9: 99d4c039bd5 object-file: read objects via the loose object source
@@ object-file.c: int stream_object_signature(struct repository *r, const struct ob
* call to stat_loose_object().
*/
-static int stat_loose_object(struct repository *r, const struct object_id *oid,
-+static int stat_loose_object(struct odb_loose_source *source,
++static int stat_loose_object(struct odb_loose_source *loose,
+ const struct object_id *oid,
struct stat *st, const char **path)
{
@@ object-file.c: int stream_object_signature(struct repository *r, const struct ob
- if (!lstat(*path, st))
- return 0;
- }
-+ *path = odb_loose_path(source->source, &buf, oid);
++ *path = odb_loose_path(loose->source, &buf, oid);
+ if (!lstat(*path, st))
+ return 0;
@@ object-file.c: static int stat_loose_object(struct repository *r, const struct o
* descriptor. See the caveats on the "path" parameter above.
*/
-static int open_loose_object(struct repository *r,
-+static int open_loose_object(struct odb_loose_source *source,
++static int open_loose_object(struct odb_loose_source *loose,
const struct object_id *oid, const char **path)
{
- int fd;
@@ object-file.c: static int stat_loose_object(struct repository *r, const struct o
- fd = git_open(*path);
- if (fd >= 0)
- return fd;
-+ *path = odb_loose_path(source->source, &buf, oid);
++ *path = odb_loose_path(loose->source, &buf, oid);
+ fd = git_open(*path);
+ if (fd >= 0)
+ return fd;
@@ object-file.c: static int stat_loose_object(struct repository *r, const struct o
}
-static int quick_has_loose(struct repository *r,
-+static int quick_has_loose(struct odb_loose_source *source,
++static int quick_has_loose(struct odb_loose_source *loose,
const struct object_id *oid)
{
- struct odb_source *source;
@@ object-file.c: static int stat_loose_object(struct repository *r, const struct o
- return 1;
- }
- return 0;
-+ return !!oidtree_contains(odb_loose_source_cache(source->source, oid), oid);
++ return !!oidtree_contains(odb_loose_source_cache(loose->source, oid), oid);
}
/*
@@ object-file.c: int loose_object_info(struct repository *r,
error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
## object-file.h ##
-@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *source);
+@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *loose);
/* Reprepare the loose source by emptying the loose object cache. */
void odb_loose_source_reprepare(struct odb_source *source);
10: c1a2d001b37 = 10: 873dedf2efc object-file: rename `has_loose_object()`
11: 2aef29ba790 = 11: 86810514083 object-file: refactor freshening of objects
12: 5c5d6928af2 = 12: 89348b4c540 object-file: rename `write_object_file()`
13: 116f8065a2a = 13: 2dd87a1f135 object-file: refactor writing objects via a stream
---
base-commit: c54a18ef67e59cdbcd77d6294916d42c98c62d1d
change-id: 20251017-b4-pks-odb-loose-backend-b1e003c41107
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2 01/13] odb: fix subtle logic to check whether an alternate is usable
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
` (11 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
When adding an alternate to the object database we first check whether
or not the path is usable. A path is usable if:
- It actually exists.
- We don't have it in our object sources yet.
While the former check is trivial enough, the latter part is somewhat
subtle and prone for bugs. This is because the function doesn't only
check whether or not the given path is usable. But if it _is_ usable, we
also store that path in the map of object sources immediately.
The tricky part here is that the path that gets stored in the map is
_not_ copied. Instead, we rely on the fact that subsequent code uses
`strbuf_detach()` to store the exact same allocated memory in the
created object source. Consequently, the memory is owned by the source
but _also_ stored in the map. This subtlety is easy to miss, so if one
decides to refactor this code one can easily end up breaking this
mechanism.
Make the relationship more explicit by not storing the path as part of
`alt_odb_usable()`. Instead, store the path after we have created the
source so that we can use the source's path pointer directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/odb.c b/odb.c
index 00a6e71568b..57d85ed9505 100644
--- a/odb.c
+++ b/odb.c
@@ -86,17 +86,16 @@ int odb_mkstemp(struct object_database *odb,
/*
* Return non-zero iff the path is usable as an alternate object database.
*/
-static int alt_odb_usable(struct object_database *o,
- struct strbuf *path,
- const char *normalized_objdir, khiter_t *pos)
+static int alt_odb_usable(struct object_database *o, const char *path,
+ const char *normalized_objdir)
{
int r;
/* Detect cases where alternate disappeared */
- if (!is_directory(path->buf)) {
+ if (!is_directory(path)) {
error(_("object directory %s does not exist; "
"check .git/objects/info/alternates"),
- path->buf);
+ path);
return 0;
}
@@ -113,11 +112,14 @@ static int alt_odb_usable(struct object_database *o,
assert(r == 1); /* never used */
kh_value(o->source_by_path, p) = o->sources;
}
- if (fspatheq(path->buf, normalized_objdir))
+
+ if (fspatheq(path, normalized_objdir))
+ return 0;
+
+ if (kh_get_odb_path_map(o->source_by_path, path) < kh_end(o->source_by_path))
return 0;
- *pos = kh_put_odb_path_map(o->source_by_path, path->buf, &r);
- /* r: 0 = exists, 1 = never used, 2 = deleted */
- return r == 0 ? 0 : 1;
+
+ return 1;
}
/*
@@ -148,6 +150,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
struct strbuf pathbuf = STRBUF_INIT;
struct strbuf tmp = STRBUF_INIT;
khiter_t pos;
+ int ret;
if (!is_absolute_path(dir) && relative_base) {
strbuf_realpath(&pathbuf, relative_base, 1);
@@ -172,20 +175,21 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
strbuf_reset(&tmp);
strbuf_realpath(&tmp, odb->sources->path, 1);
- if (!alt_odb_usable(odb, &pathbuf, tmp.buf, &pos))
+ if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
goto error;
CALLOC_ARRAY(alternate, 1);
alternate->odb = odb;
alternate->local = false;
- /* pathbuf.buf is already in r->objects->source_by_path */
alternate->path = strbuf_detach(&pathbuf, NULL);
/* add the alternate entry */
*odb->sources_tail = alternate;
odb->sources_tail = &(alternate->next);
- alternate->next = NULL;
- assert(odb->source_by_path);
+
+ pos = kh_put_odb_path_map(odb->source_by_path, alternate->path, &ret);
+ if (!ret)
+ BUG("source must not yet exist");
kh_value(odb->source_by_path, pos) = alternate;
/* recursively add alternates */
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 02/13] odb: introduce `odb_source_new()`
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 03/13] odb: adjust naming to free object sources Patrick Steinhardt
` (10 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
We have three different locations where we create a new ODB source.
Deduplicate the logic via a new `odb_source_new()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 23 ++++++++++++++++-------
odb.h | 4 ++++
repository.c | 14 +++++++++-----
3 files changed, 29 insertions(+), 12 deletions(-)
diff --git a/odb.c b/odb.c
index 57d85ed9505..d2d4c514ae5 100644
--- a/odb.c
+++ b/odb.c
@@ -141,6 +141,20 @@ static void read_info_alternates(struct object_database *odb,
const char *relative_base,
int depth);
+struct odb_source *odb_source_new(struct object_database *odb,
+ const char *path,
+ bool local)
+{
+ struct odb_source *source;
+
+ CALLOC_ARRAY(source, 1);
+ source->odb = odb;
+ source->local = local;
+ source->path = xstrdup(path);
+
+ return source;
+}
+
static struct odb_source *link_alt_odb_entry(struct object_database *odb,
const char *dir,
const char *relative_base,
@@ -178,10 +192,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
goto error;
- CALLOC_ARRAY(alternate, 1);
- alternate->odb = odb;
- alternate->local = false;
- alternate->path = strbuf_detach(&pathbuf, NULL);
+ alternate = odb_source_new(odb, pathbuf.buf, false);
/* add the alternate entry */
*odb->sources_tail = alternate;
@@ -341,9 +352,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
* Make a new primary odb and link the old primary ODB in as an
* alternate
*/
- source = xcalloc(1, sizeof(*source));
- source->odb = odb;
- source->path = xstrdup(dir);
+ source = odb_source_new(odb, dir, false);
/*
* Disable ref updates while a temporary odb is active, since
diff --git a/odb.h b/odb.h
index e6602dd90c8..2bec895d135 100644
--- a/odb.h
+++ b/odb.h
@@ -89,6 +89,10 @@ struct odb_source {
char *path;
};
+struct odb_source *odb_source_new(struct object_database *odb,
+ const char *path,
+ bool local);
+
struct packed_git;
struct packfile_store;
struct cached_object_entry;
diff --git a/repository.c b/repository.c
index 6faf5c73981..6aaa7ba0086 100644
--- a/repository.c
+++ b/repository.c
@@ -160,20 +160,24 @@ void repo_set_gitdir(struct repository *repo,
* until after xstrdup(root). Then we can free it.
*/
char *old_gitdir = repo->gitdir;
+ char *objects_path = NULL;
repo->gitdir = xstrdup(gitfile ? gitfile : root);
free(old_gitdir);
repo_set_commondir(repo, o->commondir);
+ expand_base_dir(&objects_path, o->object_dir,
+ repo->commondir, "objects");
if (!repo->objects->sources) {
- CALLOC_ARRAY(repo->objects->sources, 1);
- repo->objects->sources->odb = repo->objects;
- repo->objects->sources->local = true;
+ repo->objects->sources = odb_source_new(repo->objects,
+ objects_path, true);
repo->objects->sources_tail = &repo->objects->sources->next;
+ free(objects_path);
+ } else {
+ free(repo->objects->sources->path);
+ repo->objects->sources->path = objects_path;
}
- expand_base_dir(&repo->objects->sources->path, o->object_dir,
- repo->commondir, "objects");
repo->objects->sources->disable_ref_updates = o->disable_ref_updates;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 03/13] odb: adjust naming to free object sources
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
` (9 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
The functions `free_object_directory()` and `free_object_directories()`
are responsible for freeing a single object source or all object sources
connected to an object database, respectively. The associated structure
has been renamed from `struct object_directory` to `struct odb_source`
in a1e2581a1e (object-store: rename `object_directory` to `odb_source`,
2025-07-01) though, so the names are somewhat stale nowadays.
Rename them to mention the new struct name instead. Furthermore, while
at it, adapt them to our modern naming schema where we first have the
subject followed by a verb.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/odb.c b/odb.c
index d2d4c514ae5..77490d7fdbe 100644
--- a/odb.c
+++ b/odb.c
@@ -365,7 +365,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
return source->next;
}
-static void free_object_directory(struct odb_source *source)
+static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_clear_loose_cache(source);
@@ -387,7 +387,7 @@ void odb_restore_primary_source(struct object_database *odb,
BUG("we expect the old primary object store to be the first alternate");
odb->sources = restore_source;
- free_object_directory(cur_source);
+ odb_source_free(cur_source);
}
char *compute_alternate_path(const char *path, struct strbuf *err)
@@ -1015,13 +1015,13 @@ struct object_database *odb_new(struct repository *repo)
return o;
}
-static void free_object_directories(struct object_database *o)
+static void odb_free_sources(struct object_database *o)
{
while (o->sources) {
struct odb_source *next;
next = o->sources->next;
- free_object_directory(o->sources);
+ odb_source_free(o->sources);
o->sources = next;
}
kh_destroy_odb_path_map(o->source_by_path);
@@ -1039,7 +1039,7 @@ void odb_clear(struct object_database *o)
o->commit_graph = NULL;
o->commit_graph_attempted = 0;
- free_object_directories(o);
+ odb_free_sources(o);
o->sources_tail = NULL;
o->loaded_alternates = 0;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 04/13] object-file: move `fetch_if_missing`
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (2 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 03/13] odb: adjust naming to free object sources Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 05/13] object-file: introduce `struct odb_loose_source` Patrick Steinhardt
` (8 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
The `fetch_if_missing` global variable is declared in "object-file.h"
but defined in "odb.c". The variable relates to the whole object
database instead of only loose objects, so move the declaration into
"odb.h" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.h | 8 --------
odb.h | 8 ++++++++
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/object-file.h b/object-file.h
index 3fd48dcafbf..097e9764be1 100644
--- a/object-file.h
+++ b/object-file.h
@@ -7,14 +7,6 @@
struct index_state;
-/*
- * Set this to 0 to prevent odb_read_object_info_extended() from fetching missing
- * blobs. This has a difference only if extensions.partialClone is set.
- *
- * Its default value is 1.
- */
-extern int fetch_if_missing;
-
enum {
INDEX_WRITE_OBJECT = (1 << 0),
INDEX_FORMAT_CHECK = (1 << 1),
diff --git a/odb.h b/odb.h
index 2bec895d135..2346ffeca85 100644
--- a/odb.h
+++ b/odb.h
@@ -14,6 +14,14 @@ struct strbuf;
struct repository;
struct multi_pack_index;
+/*
+ * Set this to 0 to prevent odb_read_object_info_extended() from fetching missing
+ * blobs. This has a difference only if extensions.partialClone is set.
+ *
+ * Its default value is 1.
+ */
+extern int fetch_if_missing;
+
/*
* Compute the exact path an alternate is at and returns it. In case of
* error NULL is returned and the human readable error is added to `err`
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 05/13] object-file: introduce `struct odb_loose_source`
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (3 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
` (7 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Currently, all state that relates to loose objects is held directly by
the `struct odb_source`. Introduce a new `struct odb_loose_source` to
hold the state instead so that it is entirely self-contained.
This structure will eventually morph into the backend for accessing
loose objects. As such, this is part of the refactorings to introduce
pluggable object databases.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 13 +++++++++++++
object-file.h | 7 +++++++
odb.c | 2 ++
odb.h | 3 +++
4 files changed, 25 insertions(+)
diff --git a/object-file.c b/object-file.c
index 4675c8ed6b6..4dfea0ebebd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1995,3 +1995,16 @@ void object_file_transaction_commit(struct odb_transaction *transaction)
transaction->odb->transaction = NULL;
free(transaction);
}
+
+struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
+{
+ struct odb_loose_source *loose;
+ CALLOC_ARRAY(loose, 1);
+ loose->source = source;
+ return loose;
+}
+
+void odb_loose_source_free(struct odb_loose_source *loose)
+{
+ free(loose);
+}
diff --git a/object-file.h b/object-file.h
index 097e9764be1..d9d2de5055e 100644
--- a/object-file.h
+++ b/object-file.h
@@ -18,6 +18,13 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa
struct odb_source;
+struct odb_loose_source {
+ struct odb_source *source;
+};
+
+struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
+void odb_loose_source_free(struct odb_loose_source *loose);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
diff --git a/odb.c b/odb.c
index 77490d7fdbe..f1b250ceffe 100644
--- a/odb.c
+++ b/odb.c
@@ -151,6 +151,7 @@ struct odb_source *odb_source_new(struct object_database *odb,
source->odb = odb;
source->local = local;
source->path = xstrdup(path);
+ source->loose = odb_loose_source_new(source);
return source;
}
@@ -368,6 +369,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
static void odb_source_free(struct odb_source *source)
{
free(source->path);
+ odb_loose_source_free(source->loose);
odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
diff --git a/odb.h b/odb.h
index 2346ffeca85..f7e1bf87803 100644
--- a/odb.h
+++ b/odb.h
@@ -48,6 +48,9 @@ struct odb_source {
/* Object database that owns this object source. */
struct object_database *odb;
+ /* Private state for loose objects. */
+ struct odb_loose_source *loose;
+
/*
* Used to store the results of readdir(3) calls when we are OK
* sacrificing accuracy due to races for speed. That includes
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 06/13] object-file: move loose object cache into loose source
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (4 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 05/13] object-file: introduce `struct odb_loose_source` Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
` (6 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Our loose objects use a cache that (optionally) stores all objects for
each of the opened sharding directories. This cache is located in the
`struct odb_source`, but now that we have `struct odb_loose_source` it
makes sense to move it into the latter structure so that all state that
relates to loose objects is entirely self-contained.
Do so. While at it, rename corresponding functions to have a prefix that
relates to `struct odb_loose_source`.
Note that despite this prefix, the functions still accept a `struct
odb_source` as input. This is done intentionally: once we introduce
pluggable object databases, we will continue to accept this struct but
then do a cast inside these functions to `struct odb_loose_source`. This
design is similar to how we do it for our ref backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
loose.c | 9 +++++----
object-file.c | 35 +++++++++++++++++++----------------
object-file.h | 16 ++++++++++++++--
object-name.c | 2 +-
odb.c | 1 -
odb.h | 12 ------------
6 files changed, 39 insertions(+), 36 deletions(-)
diff --git a/loose.c b/loose.c
index e8ea6e7e24b..8cc7573ff2b 100644
--- a/loose.c
+++ b/loose.c
@@ -1,6 +1,7 @@
#include "git-compat-util.h"
#include "hash.h"
#include "path.h"
+#include "object-file.h"
#include "odb.h"
#include "hex.h"
#include "repository.h"
@@ -54,7 +55,7 @@ static int insert_loose_map(struct odb_source *source,
inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
if (inserted)
- oidtree_insert(source->loose_objects_cache, compat_oid);
+ oidtree_insert(source->loose->cache, compat_oid);
return inserted;
}
@@ -66,9 +67,9 @@ static int load_one_loose_object_map(struct repository *repo, struct odb_source
if (!source->loose_map)
loose_object_map_init(&source->loose_map);
- if (!source->loose_objects_cache) {
- ALLOC_ARRAY(source->loose_objects_cache, 1);
- oidtree_init(source->loose_objects_cache);
+ if (!source->loose->cache) {
+ ALLOC_ARRAY(source->loose->cache, 1);
+ oidtree_init(source->loose->cache);
}
insert_loose_map(source, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
diff --git a/object-file.c b/object-file.c
index 4dfea0ebebd..8a3db2877ee 100644
--- a/object-file.c
+++ b/object-file.c
@@ -223,7 +223,7 @@ static int quick_has_loose(struct repository *r,
odb_prepare_alternates(r->objects);
for (source = r->objects->sources; source; source = source->next) {
- if (oidtree_contains(odb_loose_cache(source, oid), oid))
+ if (oidtree_contains(odb_loose_source_cache(source, oid), oid))
return 1;
}
return 0;
@@ -1802,44 +1802,44 @@ static int append_loose_object(const struct object_id *oid,
return 0;
}
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid)
+struct oidtree *odb_loose_source_cache(struct odb_source *source,
+ const struct object_id *oid)
{
int subdir_nr = oid->hash[0];
struct strbuf buf = STRBUF_INIT;
- size_t word_bits = bitsizeof(source->loose_objects_subdir_seen[0]);
+ size_t word_bits = bitsizeof(source->loose->subdir_seen[0]);
size_t word_index = subdir_nr / word_bits;
size_t mask = (size_t)1u << (subdir_nr % word_bits);
uint32_t *bitmap;
if (subdir_nr < 0 ||
- (size_t) subdir_nr >= bitsizeof(source->loose_objects_subdir_seen))
+ (size_t) subdir_nr >= bitsizeof(source->loose->subdir_seen))
BUG("subdir_nr out of range");
- bitmap = &source->loose_objects_subdir_seen[word_index];
+ bitmap = &source->loose->subdir_seen[word_index];
if (*bitmap & mask)
- return source->loose_objects_cache;
- if (!source->loose_objects_cache) {
- ALLOC_ARRAY(source->loose_objects_cache, 1);
- oidtree_init(source->loose_objects_cache);
+ return source->loose->cache;
+ if (!source->loose->cache) {
+ ALLOC_ARRAY(source->loose->cache, 1);
+ oidtree_init(source->loose->cache);
}
strbuf_addstr(&buf, source->path);
for_each_file_in_obj_subdir(subdir_nr, &buf,
source->odb->repo->hash_algo,
append_loose_object,
NULL, NULL,
- source->loose_objects_cache);
+ source->loose->cache);
*bitmap |= mask;
strbuf_release(&buf);
- return source->loose_objects_cache;
+ return source->loose->cache;
}
void odb_clear_loose_cache(struct odb_source *source)
{
- oidtree_clear(source->loose_objects_cache);
- FREE_AND_NULL(source->loose_objects_cache);
- memset(&source->loose_objects_subdir_seen, 0,
- sizeof(source->loose_objects_subdir_seen));
+ oidtree_clear(source->loose->cache);
+ FREE_AND_NULL(source->loose->cache);
+ memset(&source->loose->subdir_seen, 0,
+ sizeof(source->loose->subdir_seen));
}
static int check_stream_oid(git_zstream *stream,
@@ -2006,5 +2006,8 @@ struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
void odb_loose_source_free(struct odb_loose_source *loose)
{
+ if (!loose)
+ return;
+ odb_clear_loose_cache(loose->source);
free(loose);
}
diff --git a/object-file.h b/object-file.h
index d9d2de5055e..887b48725a0 100644
--- a/object-file.h
+++ b/object-file.h
@@ -20,6 +20,18 @@ struct odb_source;
struct odb_loose_source {
struct odb_source *source;
+
+ /*
+ * Used to store the results of readdir(3) calls when we are OK
+ * sacrificing accuracy due to races for speed. That includes
+ * object existence with OBJECT_INFO_QUICK, as well as
+ * our search for unique abbreviated hashes. Don't use it for tasks
+ * requiring greater accuracy!
+ *
+ * Be sure to call odb_load_loose_cache() before using.
+ */
+ uint32_t subdir_seen[8]; /* 256 bits */
+ struct oidtree *cache;
};
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
@@ -29,8 +41,8 @@ void odb_loose_source_free(struct odb_loose_source *loose);
* Populate and return the loose object cache array corresponding to the
* given object ID.
*/
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid);
+struct oidtree *odb_loose_source_cache(struct odb_source *source,
+ const struct object_id *oid);
/* Empty the loose object cache for the specified object directory. */
void odb_clear_loose_cache(struct odb_source *source);
diff --git a/object-name.c b/object-name.c
index f6902e140dd..77e33e693aa 100644
--- a/object-name.c
+++ b/object-name.c
@@ -116,7 +116,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
struct odb_source *source;
for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next)
- oidtree_each(odb_loose_cache(source, &ds->bin_pfx),
+ oidtree_each(odb_loose_source_cache(source, &ds->bin_pfx),
&ds->bin_pfx, ds->len, match_prefix, ds);
}
diff --git a/odb.c b/odb.c
index f1b250ceffe..5a5e770dcd0 100644
--- a/odb.c
+++ b/odb.c
@@ -370,7 +370,6 @@ static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_loose_source_free(source->loose);
- odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
}
diff --git a/odb.h b/odb.h
index f7e1bf87803..ad57193c66a 100644
--- a/odb.h
+++ b/odb.h
@@ -51,18 +51,6 @@ struct odb_source {
/* Private state for loose objects. */
struct odb_loose_source *loose;
- /*
- * Used to store the results of readdir(3) calls when we are OK
- * sacrificing accuracy due to races for speed. That includes
- * object existence with OBJECT_INFO_QUICK, as well as
- * our search for unique abbreviated hashes. Don't use it for tasks
- * requiring greater accuracy!
- *
- * Be sure to call odb_load_loose_cache() before using.
- */
- uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
- struct oidtree *loose_objects_cache;
-
/* Map between object IDs for loose objects. */
struct loose_object_map *loose_map;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 07/13] object-file: hide internals when we need to reprepare loose sources
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (5 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 08/13] object-file: move loose object map into loose source Patrick Steinhardt
` (5 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
There are two different situations where we have to clear the cache of
loose objects:
- When freeing the loose object source itself to avoid memory leaks.
- When repreparing the loose object source so that any potentially-
stale data is getting evicted from the cache.
The former is already handled by `odb_loose_source_free()`. But the
latter case is still done manually by in `odb_reprepare()`, so we are
leaking internals into that code.
Introduce a new `odb_loose_source_reprepare()` function as an equivalent
to `packfile_store_prepare()` to hide these implementation details.
Furthermore, while at it, rename the function `odb_clear_loose_cache()`
to `odb_loose_source_clear()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 17 +++++++++++------
object-file.h | 6 +++---
odb.c | 2 +-
3 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/object-file.c b/object-file.c
index 8a3db2877ee..2b908dc1215 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1834,12 +1834,17 @@ struct oidtree *odb_loose_source_cache(struct odb_source *source,
return source->loose->cache;
}
-void odb_clear_loose_cache(struct odb_source *source)
+static void odb_loose_source_clear_cache(struct odb_loose_source *loose)
{
- oidtree_clear(source->loose->cache);
- FREE_AND_NULL(source->loose->cache);
- memset(&source->loose->subdir_seen, 0,
- sizeof(source->loose->subdir_seen));
+ oidtree_clear(loose->cache);
+ FREE_AND_NULL(loose->cache);
+ memset(&loose->subdir_seen, 0,
+ sizeof(loose->subdir_seen));
+}
+
+void odb_loose_source_reprepare(struct odb_source *source)
+{
+ odb_loose_source_clear_cache(source->loose);
}
static int check_stream_oid(git_zstream *stream,
@@ -2008,6 +2013,6 @@ void odb_loose_source_free(struct odb_loose_source *loose)
{
if (!loose)
return;
- odb_clear_loose_cache(loose->source);
+ odb_loose_source_clear_cache(loose);
free(loose);
}
diff --git a/object-file.h b/object-file.h
index 887b48725a0..6cbb4b44352 100644
--- a/object-file.h
+++ b/object-file.h
@@ -37,6 +37,9 @@ struct odb_loose_source {
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
void odb_loose_source_free(struct odb_loose_source *loose);
+/* Reprepare the loose source by emptying the loose object cache. */
+void odb_loose_source_reprepare(struct odb_source *source);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -44,9 +47,6 @@ void odb_loose_source_free(struct odb_loose_source *loose);
struct oidtree *odb_loose_source_cache(struct odb_source *source,
const struct object_id *oid);
-/* Empty the loose object cache for the specified object directory. */
-void odb_clear_loose_cache(struct odb_source *source);
-
/*
* Put in `buf` the name of the file in the local object database that
* would be used to store a loose object with the specified oid.
diff --git a/odb.c b/odb.c
index 5a5e770dcd0..b4196f0b323 100644
--- a/odb.c
+++ b/odb.c
@@ -1071,7 +1071,7 @@ void odb_reprepare(struct object_database *o)
odb_prepare_alternates(o);
for (source = o->sources; source; source = source->next)
- odb_clear_loose_cache(source);
+ odb_loose_source_reprepare(source);
o->approximate_object_count_valid = 0;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 08/13] object-file: move loose object map into loose source
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (6 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 09/13] object-file: read objects via the loose object source Patrick Steinhardt
` (4 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
The loose object map is used to map from the repository's canonical
object hash to the compatibility hash. As the name indicates, this map
is only used for loose objects, and as such it is tied to a specific
loose object source.
Same as with preceding commits, move this map into the loose object
source accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
loose.c | 10 +++++-----
object-file.c | 1 +
object-file.h | 3 +++
odb.c | 1 -
odb.h | 3 ---
5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/loose.c b/loose.c
index 8cc7573ff2b..56cf64b648b 100644
--- a/loose.c
+++ b/loose.c
@@ -49,7 +49,7 @@ static int insert_loose_map(struct odb_source *source,
const struct object_id *oid,
const struct object_id *compat_oid)
{
- struct loose_object_map *map = source->loose_map;
+ struct loose_object_map *map = source->loose->map;
int inserted = 0;
inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
@@ -65,8 +65,8 @@ static int load_one_loose_object_map(struct repository *repo, struct odb_source
struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
FILE *fp;
- if (!source->loose_map)
- loose_object_map_init(&source->loose_map);
+ if (!source->loose->map)
+ loose_object_map_init(&source->loose->map);
if (!source->loose->cache) {
ALLOC_ARRAY(source->loose->cache, 1);
oidtree_init(source->loose->cache);
@@ -125,7 +125,7 @@ int repo_read_loose_object_map(struct repository *repo)
int repo_write_loose_object_map(struct repository *repo)
{
- kh_oid_map_t *map = repo->objects->sources->loose_map->to_compat;
+ kh_oid_map_t *map = repo->objects->sources->loose->map->to_compat;
struct lock_file lock;
int fd;
khiter_t iter;
@@ -231,7 +231,7 @@ int repo_loose_object_map_oid(struct repository *repo,
khiter_t pos;
for (source = repo->objects->sources; source; source = source->next) {
- struct loose_object_map *loose_map = source->loose_map;
+ struct loose_object_map *loose_map = source->loose->map;
if (!loose_map)
continue;
map = (to == repo->compat_hash_algo) ?
diff --git a/object-file.c b/object-file.c
index 2b908dc1215..e8877876d77 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2014,5 +2014,6 @@ void odb_loose_source_free(struct odb_loose_source *loose)
if (!loose)
return;
odb_loose_source_clear_cache(loose);
+ loose_object_map_clear(&loose->map);
free(loose);
}
diff --git a/object-file.h b/object-file.h
index 6cbb4b44352..74bae27612f 100644
--- a/object-file.h
+++ b/object-file.h
@@ -32,6 +32,9 @@ struct odb_loose_source {
*/
uint32_t subdir_seen[8]; /* 256 bits */
struct oidtree *cache;
+
+ /* Map between object IDs for loose objects. */
+ struct loose_object_map *map;
};
struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
diff --git a/odb.c b/odb.c
index b4196f0b323..96059456f20 100644
--- a/odb.c
+++ b/odb.c
@@ -370,7 +370,6 @@ static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_loose_source_free(source->loose);
- loose_object_map_clear(&source->loose_map);
free(source);
}
diff --git a/odb.h b/odb.h
index ad57193c66a..25fbcd7d951 100644
--- a/odb.h
+++ b/odb.h
@@ -51,9 +51,6 @@ struct odb_source {
/* Private state for loose objects. */
struct odb_loose_source *loose;
- /* Map between object IDs for loose objects. */
- struct loose_object_map *loose_map;
-
/*
* private data
*
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 09/13] object-file: read objects via the loose object source
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (7 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 08/13] object-file: move loose object map into loose source Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
` (3 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
When reading an object via `loose_object_info()` or `map_loose_object()`
we hand in the whole repository. We then iterate through each of the
object sources to figure out whether that source has the object in
question.
This logic is reversing responsibility though: a specific backend should
only care about one specific source, where the object sources themselves
are then managed by the object database.
Refactor the code accordingly by passing an object source to both of
these functions instead. The different sources are then handled by
either `do_oid_object_info_extended()`, which sits on the object
database level, and by `open_istream_loose()`. The latter function
arguably is still at the wrong level, but this will be cleaned up at a
later point in time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 68 ++++++++++++++++++++++-------------------------------------
object-file.h | 15 +++++++------
odb.c | 9 ++++++--
streaming.c | 11 +++++++++-
4 files changed, 50 insertions(+), 53 deletions(-)
diff --git a/object-file.c b/object-file.c
index e8877876d77..64a3d45376a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -167,25 +167,22 @@ int stream_object_signature(struct repository *r, const struct object_id *oid)
}
/*
- * Find "oid" as a loose object in the local repository or in an alternate.
+ * Find "oid" as a loose object in given source.
* Returns 0 on success, negative on failure.
*
* The "path" out-parameter will give the path of the object we found (if any).
* Note that it may point to static storage and is only valid until another
* call to stat_loose_object().
*/
-static int stat_loose_object(struct repository *r, const struct object_id *oid,
+static int stat_loose_object(struct odb_loose_source *loose,
+ const struct object_id *oid,
struct stat *st, const char **path)
{
- struct odb_source *source;
static struct strbuf buf = STRBUF_INIT;
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- *path = odb_loose_path(source, &buf, oid);
- if (!lstat(*path, st))
- return 0;
- }
+ *path = odb_loose_path(loose->source, &buf, oid);
+ if (!lstat(*path, st))
+ return 0;
return -1;
}
@@ -194,39 +191,24 @@ static int stat_loose_object(struct repository *r, const struct object_id *oid,
* Like stat_loose_object(), but actually open the object and return the
* descriptor. See the caveats on the "path" parameter above.
*/
-static int open_loose_object(struct repository *r,
+static int open_loose_object(struct odb_loose_source *loose,
const struct object_id *oid, const char **path)
{
- int fd;
- struct odb_source *source;
- int most_interesting_errno = ENOENT;
static struct strbuf buf = STRBUF_INIT;
+ int fd;
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- *path = odb_loose_path(source, &buf, oid);
- fd = git_open(*path);
- if (fd >= 0)
- return fd;
+ *path = odb_loose_path(loose->source, &buf, oid);
+ fd = git_open(*path);
+ if (fd >= 0)
+ return fd;
- if (most_interesting_errno == ENOENT)
- most_interesting_errno = errno;
- }
- errno = most_interesting_errno;
return -1;
}
-static int quick_has_loose(struct repository *r,
+static int quick_has_loose(struct odb_loose_source *loose,
const struct object_id *oid)
{
- struct odb_source *source;
-
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- if (oidtree_contains(odb_loose_source_cache(source, oid), oid))
- return 1;
- }
- return 0;
+ return !!oidtree_contains(odb_loose_source_cache(loose->source, oid), oid);
}
/*
@@ -252,12 +234,12 @@ static void *map_fd(int fd, const char *path, unsigned long *size)
return map;
}
-void *map_loose_object(struct repository *r,
- const struct object_id *oid,
- unsigned long *size)
+void *odb_loose_source_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size)
{
const char *p;
- int fd = open_loose_object(r, oid, &p);
+ int fd = open_loose_object(source->loose, oid, &p);
if (fd < 0)
return NULL;
@@ -407,9 +389,9 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
return 0;
}
-int loose_object_info(struct repository *r,
- const struct object_id *oid,
- struct object_info *oi, int flags)
+int odb_loose_source_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags)
{
int status = 0;
int fd;
@@ -422,7 +404,7 @@ int loose_object_info(struct repository *r,
enum object_type type_scratch;
if (oi->delta_base_oid)
- oidclr(oi->delta_base_oid, r->hash_algo);
+ oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
/*
* If we don't care about type or size, then we don't
@@ -435,15 +417,15 @@ int loose_object_info(struct repository *r,
if (!oi->typep && !oi->sizep && !oi->contentp) {
struct stat st;
if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(r, oid) ? 0 : -1;
- if (stat_loose_object(r, oid, &st, &path) < 0)
+ return quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
if (oi->disk_sizep)
*oi->disk_sizep = st.st_size;
return 0;
}
- fd = open_loose_object(r, oid, &path);
+ fd = open_loose_object(source->loose, oid, &path);
if (fd < 0) {
if (errno != ENOENT)
error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
diff --git a/object-file.h b/object-file.h
index 74bae27612f..58ec22d187e 100644
--- a/object-file.h
+++ b/object-file.h
@@ -43,6 +43,14 @@ void odb_loose_source_free(struct odb_loose_source *loose);
/* Reprepare the loose source by emptying the loose object cache. */
void odb_loose_source_reprepare(struct odb_source *source);
+int odb_loose_source_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags);
+
+void *odb_loose_source_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -66,9 +74,6 @@ const char *odb_loose_path(struct odb_source *source,
int has_loose_object(struct odb_source *source,
const struct object_id *oid);
-void *map_loose_object(struct repository *r, const struct object_id *oid,
- unsigned long *size);
-
/*
* Iterate over the files in the loose-object parts of the object
* directory "path", triggering the following callbacks:
@@ -196,10 +201,6 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
*/
int stream_object_signature(struct repository *r, const struct object_id *oid);
-int loose_object_info(struct repository *r,
- const struct object_id *oid,
- struct object_info *oi, int flags);
-
enum finalize_object_file_flags {
FOF_SKIP_COLLISION_CHECK = 1,
};
diff --git a/odb.c b/odb.c
index 96059456f20..5dc1e2c7eb0 100644
--- a/odb.c
+++ b/odb.c
@@ -697,13 +697,18 @@ static int do_oid_object_info_extended(struct object_database *odb,
return 0;
}
+ odb_prepare_alternates(odb);
+
while (1) {
+ struct odb_source *source;
+
if (find_pack_entry(odb->repo, real, &e))
break;
/* Most likely it's a loose object. */
- if (!loose_object_info(odb->repo, real, oi, flags))
- return 0;
+ for (source = odb->sources; source; source = source->next)
+ if (!odb_loose_source_read_object_info(source, real, oi, flags))
+ return 0;
/* Not a loose object; someone else may have just packed it. */
if (!(flags & OBJECT_INFO_QUICK)) {
diff --git a/streaming.c b/streaming.c
index 4b13827668e..8e554abd084 100644
--- a/streaming.c
+++ b/streaming.c
@@ -230,12 +230,21 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
enum object_type *type)
{
struct object_info oi = OBJECT_INFO_INIT;
+ struct odb_source *source;
+
oi.sizep = &st->size;
oi.typep = type;
- st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
+ odb_prepare_alternates(r->objects);
+ for (source = r->objects->sources; source; source = source->next) {
+ st->u.loose.mapped = odb_loose_source_map_object(source, oid,
+ &st->u.loose.mapsize);
+ if (st->u.loose.mapped)
+ break;
+ }
if (!st->u.loose.mapped)
return -1;
+
switch (unpack_loose_header(&st->z, st->u.loose.mapped,
st->u.loose.mapsize, st->u.loose.hdr,
sizeof(st->u.loose.hdr))) {
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 10/13] object-file: rename `has_loose_object()`
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (8 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 09/13] object-file: read objects via the loose object source Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 11/13] object-file: refactor freshening of objects Patrick Steinhardt
` (2 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Rename `has_loose_object()` to `odb_loose_source_has_object()` so that
it becomes clear that this is tied to a specific loose object source.
This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/pack-objects.c | 4 ++--
object-file.c | 6 +++---
object-file.h | 16 ++++++++--------
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5bdc44fb2de..c09cb342ee9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1716,7 +1716,7 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
*/
struct odb_source *source = the_repository->objects->sources->next;
for (; source; source = source->next)
- if (has_loose_object(source, oid))
+ if (odb_loose_source_has_object(source, oid))
return 0;
}
@@ -3980,7 +3980,7 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type
int found = 0;
for (; !found && source; source = source->next)
- if (has_loose_object(source, oid))
+ if (odb_loose_source_has_object(source, oid))
found = 1;
/*
diff --git a/object-file.c b/object-file.c
index 64a3d45376a..0255d757ba1 100644
--- a/object-file.c
+++ b/object-file.c
@@ -99,8 +99,8 @@ static int check_and_freshen_source(struct odb_source *source,
return check_and_freshen_file(path.buf, freshen);
}
-int has_loose_object(struct odb_source *source,
- const struct object_id *oid)
+int odb_loose_source_has_object(struct odb_source *source,
+ const struct object_id *oid)
{
return check_and_freshen_source(source, oid, 0);
}
@@ -1161,7 +1161,7 @@ int force_object_loose(struct odb_source *source,
int ret;
for (struct odb_source *s = source->odb->sources; s; s = s->next)
- if (has_loose_object(s, oid))
+ if (odb_loose_source_has_object(s, oid))
return 0;
oi.typep = &type;
diff --git a/object-file.h b/object-file.h
index 58ec22d187e..e6daa566f32 100644
--- a/object-file.h
+++ b/object-file.h
@@ -51,6 +51,14 @@ void *odb_loose_source_map_object(struct odb_source *source,
const struct object_id *oid,
unsigned long *size);
+/*
+ * Return true iff an object database source has a loose object
+ * with the specified name. This function does not respect replace
+ * references.
+ */
+int odb_loose_source_has_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -66,14 +74,6 @@ const char *odb_loose_path(struct odb_source *source,
struct strbuf *buf,
const struct object_id *oid);
-/*
- * Return true iff an object database source has a loose object
- * with the specified name. This function does not respect replace
- * references.
- */
-int has_loose_object(struct odb_source *source,
- const struct object_id *oid);
-
/*
* Iterate over the files in the loose-object parts of the object
* directory "path", triggering the following callbacks:
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 11/13] object-file: refactor freshening of objects
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (9 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
When writing an object that already exists in our object database we
skip the write and instead only update mtimes of the object, either in
its packed or loose object format. This logic is wholly contained in
"object-file.c", but that file is really only concerned with loose
objects. So it does not really make sense that it also contains the
logic to freshen a packed object.
Introduce a new `odb_freshen_object()` function that sits on the object
database level and two functions `packfile_store_freshen_object()` and
`odb_loose_source_freshen_object()`. Like this, the format-specific
functions can be part of their respective subsystems, while the backend
agnostic function to freshen an object sits at the object database
layer.
Note that this change also moves the logic that iterates through object
sources from the object source layer into the object database layer.
This change is intentional: object sources should ideally only have to
worry about themselves, and coordination of different sources should be
handled on the object database level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 33 +++++----------------------------
object-file.h | 3 +++
odb.c | 16 ++++++++++++++++
odb.h | 3 +++
packfile.c | 16 ++++++++++++++++
packfile.h | 3 +++
6 files changed, 46 insertions(+), 28 deletions(-)
diff --git a/object-file.c b/object-file.c
index 0255d757ba1..5ea24de205d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -968,30 +968,10 @@ static int write_loose_object(struct odb_source *source,
FOF_SKIP_COLLISION_CHECK);
}
-static int freshen_loose_object(struct object_database *odb,
- const struct object_id *oid)
+int odb_loose_source_freshen_object(struct odb_source *source,
+ const struct object_id *oid)
{
- odb_prepare_alternates(odb);
- for (struct odb_source *source = odb->sources; source; source = source->next)
- if (check_and_freshen_source(source, oid, 1))
- return 1;
- return 0;
-}
-
-static int freshen_packed_object(struct object_database *odb,
- const struct object_id *oid)
-{
- struct pack_entry e;
- if (!find_pack_entry(odb->repo, oid, &e))
- return 0;
- if (e.p->is_cruft)
- return 0;
- if (e.p->freshened)
- return 1;
- if (!freshen_file(e.p->pack_name))
- return 0;
- e.p->freshened = 1;
- return 1;
+ return !!check_and_freshen_source(source, oid, 1);
}
int stream_loose_object(struct odb_source *source,
@@ -1073,12 +1053,10 @@ int stream_loose_object(struct odb_source *source,
die(_("deflateEnd on stream object failed (%d)"), ret);
close_loose_object(source, fd, tmp_file.buf);
- if (freshen_packed_object(source->odb, oid) ||
- freshen_loose_object(source->odb, oid)) {
+ if (odb_freshen_object(source->odb, oid)) {
unlink_or_warn(tmp_file.buf);
goto cleanup;
}
-
odb_loose_path(source, &filename, oid);
/* We finally know the object path, and create the missing dir. */
@@ -1137,8 +1115,7 @@ int write_object_file(struct odb_source *source,
* it out into .git/objects/??/?{38} file.
*/
write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
- if (freshen_packed_object(source->odb, oid) ||
- freshen_loose_object(source->odb, oid))
+ if (odb_freshen_object(source->odb, oid))
return 0;
if (write_loose_object(source, oid, hdr, hdrlen, buf, len, 0, flags))
return -1;
diff --git a/object-file.h b/object-file.h
index e6daa566f32..b11a9b95498 100644
--- a/object-file.h
+++ b/object-file.h
@@ -59,6 +59,9 @@ void *odb_loose_source_map_object(struct odb_source *source,
int odb_loose_source_has_object(struct odb_source *source,
const struct object_id *oid);
+int odb_loose_source_freshen_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
diff --git a/odb.c b/odb.c
index 5dc1e2c7eb0..6f8f665351b 100644
--- a/odb.c
+++ b/odb.c
@@ -987,6 +987,22 @@ int odb_has_object(struct object_database *odb, const struct object_id *oid,
return odb_read_object_info_extended(odb, oid, NULL, object_info_flags) >= 0;
}
+int odb_freshen_object(struct object_database *odb,
+ const struct object_id *oid)
+{
+ struct odb_source *source;
+
+ if (packfile_store_freshen_object(odb->packfiles, oid))
+ return 1;
+
+ odb_prepare_alternates(odb);
+ for (source = odb->sources; source; source = source->next)
+ if (odb_loose_source_freshen_object(source, oid))
+ return 1;
+
+ return 0;
+}
+
void odb_assert_oid_type(struct object_database *odb,
const struct object_id *oid, enum object_type expect)
{
diff --git a/odb.h b/odb.h
index 25fbcd7d951..8681b7782b4 100644
--- a/odb.h
+++ b/odb.h
@@ -396,6 +396,9 @@ int odb_has_object(struct object_database *odb,
const struct object_id *oid,
unsigned flags);
+int odb_freshen_object(struct object_database *odb,
+ const struct object_id *oid);
+
void odb_assert_oid_type(struct object_database *odb,
const struct object_id *oid, enum object_type expect);
diff --git a/packfile.c b/packfile.c
index 5a7caec2925..2ab49a0beb1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -819,6 +819,22 @@ struct packed_git *packfile_store_load_pack(struct packfile_store *store,
return p;
}
+int packfile_store_freshen_object(struct packfile_store *store,
+ const struct object_id *oid)
+{
+ struct pack_entry e;
+ if (!find_pack_entry(store->odb->repo, oid, &e))
+ return 0;
+ if (e.p->is_cruft)
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ if (utime(e.p->pack_name, NULL))
+ return 0;
+ e.p->freshened = 1;
+ return 1;
+}
+
void (*report_garbage)(unsigned seen_bits, const char *path);
static void report_helper(const struct string_list *list,
diff --git a/packfile.h b/packfile.h
index e7a5792b6cf..0ad080046f1 100644
--- a/packfile.h
+++ b/packfile.h
@@ -161,6 +161,9 @@ struct list_head *packfile_store_get_packs_mru(struct packfile_store *store);
struct packed_git *packfile_store_load_pack(struct packfile_store *store,
const char *idx_path, int local);
+int packfile_store_freshen_object(struct packfile_store *store,
+ const struct object_id *oid);
+
struct pack_window {
struct pack_window *next;
unsigned char *base;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 12/13] object-file: rename `write_object_file()`
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (10 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 11/13] object-file: refactor freshening of objects Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Rename `write_object_file()` to `odb_loose_source_write_object()` so
that it becomes clear that this is tied to a specific loose object
source. This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 8 ++++----
object-file.h | 10 +++++-----
odb.c | 3 ++-
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/object-file.c b/object-file.c
index 5ea24de205d..d3e29e23c13 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,10 +1084,10 @@ int stream_loose_object(struct odb_source *source,
return err;
}
-int write_object_file(struct odb_source *source,
- const void *buf, unsigned long len,
- enum object_type type, struct object_id *oid,
- struct object_id *compat_oid_in, unsigned flags)
+int odb_loose_source_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags)
{
const struct git_hash_algo *algo = source->odb->repo->hash_algo;
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
diff --git a/object-file.h b/object-file.h
index b11a9b95498..428731970af 100644
--- a/object-file.h
+++ b/object-file.h
@@ -62,6 +62,11 @@ int odb_loose_source_has_object(struct odb_source *source,
int odb_loose_source_freshen_object(struct odb_source *source,
const struct object_id *oid);
+int odb_loose_source_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -168,11 +173,6 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
-int write_object_file(struct odb_source *source,
- const void *buf, unsigned long len,
- enum object_type type, struct object_id *oid,
- struct object_id *compat_oid_in, unsigned flags);
-
struct input_stream {
const void *(*read)(struct input_stream *, unsigned long *len);
void *data;
diff --git a/odb.c b/odb.c
index 6f8f665351b..432011b4dac 100644
--- a/odb.c
+++ b/odb.c
@@ -1021,7 +1021,8 @@ int odb_write_object_ext(struct object_database *odb,
struct object_id *compat_oid,
unsigned flags)
{
- return write_object_file(odb->sources, buf, len, type, oid, compat_oid, flags);
+ return odb_loose_source_write_object(odb->sources, buf, len, type,
+ oid, compat_oid, flags);
}
struct object_database *odb_new(struct repository *repo)
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 13/13] object-file: refactor writing objects via a stream
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
` (11 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
@ 2025-10-31 6:12 ` Patrick Steinhardt
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-10-31 6:12 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
We have two different ways to write an object into the database:
- We either provide the full buffer and write the object all at once.
- Or we provide an input stream that has a `read()` function so that
we can chunk the object.
The latter is especially used for large objects, where it may be too
expensive to hold the complete object in memory all at once.
While we already have `odb_write_object()` at the ODB-layer, we don't
have an equivalent for streaming an object. Introduce a new function
`odb_write_object_stream()` to address this gap so that callers don't
have to be aware of the inner workings of how to stream an object to
disk with a specific object source.
Rename `stream_loose_object()` to `odb_loose_source_write_stream()` to
clarify its scope. This matches our modern best practices around how to
name functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/unpack-objects.c | 7 +++----
object-file.c | 6 +++---
object-file.h | 14 ++++----------
odb.c | 7 +++++++
odb.h | 10 ++++++++++
5 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index ef79e43715d..6fc64e9e4b8 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -363,7 +363,7 @@ struct input_zstream_data {
int status;
};
-static const void *feed_input_zstream(struct input_stream *in_stream,
+static const void *feed_input_zstream(struct odb_write_stream *in_stream,
unsigned long *readlen)
{
struct input_zstream_data *data = in_stream->data;
@@ -393,7 +393,7 @@ static void stream_blob(unsigned long size, unsigned nr)
{
git_zstream zstream = { 0 };
struct input_zstream_data data = { 0 };
- struct input_stream in_stream = {
+ struct odb_write_stream in_stream = {
.read = feed_input_zstream,
.data = &data,
};
@@ -402,8 +402,7 @@ static void stream_blob(unsigned long size, unsigned nr)
data.zstream = &zstream;
git_inflate_init(&zstream);
- if (stream_loose_object(the_repository->objects->sources,
- &in_stream, size, &info->oid))
+ if (odb_write_object_stream(the_repository->objects, &in_stream, size, &info->oid))
die(_("failed to write object in stream"));
if (data.status != Z_STREAM_END)
diff --git a/object-file.c b/object-file.c
index d3e29e23c13..b02239f4e2d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -974,9 +974,9 @@ int odb_loose_source_freshen_object(struct odb_source *source,
return !!check_and_freshen_source(source, oid, 1);
}
-int stream_loose_object(struct odb_source *source,
- struct input_stream *in_stream, size_t len,
- struct object_id *oid)
+int odb_loose_source_write_stream(struct odb_source *source,
+ struct odb_write_stream *in_stream, size_t len,
+ struct object_id *oid)
{
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
struct object_id compat_oid;
diff --git a/object-file.h b/object-file.h
index 428731970af..c933be01fb8 100644
--- a/object-file.h
+++ b/object-file.h
@@ -67,6 +67,10 @@ int odb_loose_source_write_object(struct odb_source *source,
enum object_type type, struct object_id *oid,
struct object_id *compat_oid_in, unsigned flags);
+int odb_loose_source_write_stream(struct odb_source *source,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -173,16 +177,6 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
-struct input_stream {
- const void *(*read)(struct input_stream *, unsigned long *len);
- void *data;
- int is_finished;
-};
-
-int stream_loose_object(struct odb_source *source,
- struct input_stream *in_stream, size_t len,
- struct object_id *oid);
-
int force_object_loose(struct odb_source *source,
const struct object_id *oid, time_t mtime);
diff --git a/odb.c b/odb.c
index 432011b4dac..62d65f71a6d 100644
--- a/odb.c
+++ b/odb.c
@@ -1025,6 +1025,13 @@ int odb_write_object_ext(struct object_database *odb,
oid, compat_oid, flags);
}
+int odb_write_object_stream(struct object_database *odb,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid)
+{
+ return odb_loose_source_write_stream(odb->sources, stream, len, oid);
+}
+
struct object_database *odb_new(struct repository *repo)
{
struct object_database *o = xmalloc(sizeof(*o));
diff --git a/odb.h b/odb.h
index 8681b7782b4..92df20417b9 100644
--- a/odb.h
+++ b/odb.h
@@ -492,4 +492,14 @@ static inline int odb_write_object(struct object_database *odb,
return odb_write_object_ext(odb, buf, len, type, oid, NULL, 0);
}
+struct odb_write_stream {
+ const void *(*read)(struct odb_write_stream *, unsigned long *len);
+ void *data;
+ int is_finished;
+};
+
+int odb_write_object_stream(struct object_database *odb,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid);
+
#endif /* ODB_H */
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH 05/13] object-file: introduce `struct odb_loose_source`
2025-10-30 11:32 ` Patrick Steinhardt
2025-10-31 6:11 ` Patrick Steinhardt
@ 2025-10-31 16:10 ` Junio C Hamano
1 sibling, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2025-10-31 16:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Karthik Nayak, git
Patrick Steinhardt <ps@pks.im> writes:
>> I wonder if the naming should instead be `struct obd_source_loose` that
>> way other backends (if added) would be something like:
>>
>> struct obd_source_loose
>> struct obd_source_postgres
>> struct obd_source_mongo
>>
>> This is easier to read and also for autocompletion it leads nicely into
>> the 'obd_source_...' namespace.
>
> Hm, I see your point. I think that "loose source" flows a bit more
> natural, but I agree that the above is more accessible in code.
>
> Before I change this: does anybody else have an opinion here?
I have a slight preference for what Karthik suggested. To me,
"There are three variants of odb-source, which are odb-source-A,
odb-source-B and odb-source-C" flows more naturally than
"odb-A-source, odb-B-source, ..."
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 05/13] object-file: introduce `struct odb_loose_source`
2025-10-31 6:11 ` Patrick Steinhardt
@ 2025-10-31 16:16 ` Junio C Hamano
2025-11-03 7:19 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2025-10-31 16:16 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Karthik Nayak, git
Patrick Steinhardt <ps@pks.im> writes:
>> Hm, I see your point. I think that "loose source" flows a bit more
>> natural, but I agree that the above is more accessible in code.
>>
>> Before I change this: does anybody else have an opinion here?
>
> I think for now I'll stick to the current naming. This is due to two
> reasons:
Ah, I should have scanned my mailbox down to the end before starting
to respond.
>
> - As said, I think this flows more naturally in language. When talking
> about this you'll say "I'm using the files source" or "I'm using the
> whatever source".
I do not happen to agree with this, although my preference is minor.
> - It somewhat matches the naming we have in the reference backends,
> where we have `struct reftable_backend` and `struct files_backend`.
loose_source (without odb) may mirror calling "ref backend that uses
files" files_backend, because "ref" is redundant in the context of
talking about ref backends. "odb" is redundant when talking about
odb sources. But we are not calling them loose_odb_source,
hbase_odb_source, etc. and instead saying "odb_loose_source", which
I find is a bit strange order.
> That being said I don't feel very strong about this.
Neither do I.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 05/13] object-file: introduce `struct odb_loose_source`
2025-10-31 16:16 ` Junio C Hamano
@ 2025-11-03 7:19 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karthik Nayak, git
On Fri, Oct 31, 2025 at 09:16:58AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> >> Hm, I see your point. I think that "loose source" flows a bit more
> >> natural, but I agree that the above is more accessible in code.
> >>
> >> Before I change this: does anybody else have an opinion here?
> >
> > I think for now I'll stick to the current naming. This is due to two
> > reasons:
>
> Ah, I should have scanned my mailbox down to the end before starting
> to respond.
>
> >
> > - As said, I think this flows more naturally in language. When talking
> > about this you'll say "I'm using the files source" or "I'm using the
> > whatever source".
>
> I do not happen to agree with this, although my preference is minor.
>
> > - It somewhat matches the naming we have in the reference backends,
> > where we have `struct reftable_backend` and `struct files_backend`.
>
> loose_source (without odb) may mirror calling "ref backend that uses
> files" files_backend, because "ref" is redundant in the context of
> talking about ref backends. "odb" is redundant when talking about
> odb sources. But we are not calling them loose_odb_source,
> hbase_odb_source, etc. and instead saying "odb_loose_source", which
> I find is a bit strange order.
>
> > That being said I don't feel very strong about this.
>
> Neither do I.
Okay, this makes one in favor favor of `odb_source_loose`, one slightly
in favor, and me slightly in favor of `odb_loose_source`. I'll rename,
thanks both for your input!
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3 00/13] Carve out loose object source
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
` (14 preceding siblings ...)
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
@ 2025-11-03 7:41 ` Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
` (13 more replies)
15 siblings, 14 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:41 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Hi,
this patch series carves out loose object sources. The idea is to store
all data that relates to loose objects in a single structure, similar to
our `struct packfile_store`.
The series is structured as follows:
- Patches 1 to 4 perform some cleanups in the vicinity of object
databases.
- Patches 5 to 8 create a new `struct source_loose` and move all
state that is specific to loose objects into it.
- Patches 9 to 13 then adjust functions to work on top of that new
structure.
The motivation here is to make handling of loose objects completely
self-contained as a step towards pluggable object databases.
Changes in v3:
- Rename `struct odb_loose_source` to `odb_source_loose`. Adjust its
functions accordingly.
- Link to v2: https://lore.kernel.org/r/20251031-b4-pks-odb-loose-backend-v2-0-920f721aef71@pks.im
Changes in v2:
- Rename `sturct odb_loose_source *` arguments from `source` to
`loose`.
- Some commit message improvements.
- Link to v1: https://lore.kernel.org/r/20251024-b4-pks-odb-loose-backend-v1-0-1a4202273c38@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (13):
odb: fix subtle logic to check whether an alternate is usable
odb: introduce `odb_source_new()`
odb: adjust naming to free object sources
object-file: move `fetch_if_missing`
object-file: introduce `struct odb_source_loose`
object-file: move loose object cache into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object map into loose source
object-file: read objects via the loose object source
object-file: rename `has_loose_object()`
object-file: refactor freshening of objects
object-file: rename `write_object_file()`
object-file: refactor writing objects via a stream
builtin/pack-objects.c | 4 +-
builtin/unpack-objects.c | 7 +-
loose.c | 19 ++---
object-file.c | 175 +++++++++++++++++++++--------------------------
object-file.h | 98 ++++++++++++++------------
object-name.c | 2 +-
odb.c | 104 +++++++++++++++++++---------
odb.h | 41 +++++++----
packfile.c | 16 +++++
packfile.h | 3 +
repository.c | 14 ++--
streaming.c | 11 ++-
12 files changed, 287 insertions(+), 207 deletions(-)
Range-diff versus v2:
1: 330b7c17e6a = 1: ed548a7ee4e odb: fix subtle logic to check whether an alternate is usable
2: 68b2e736d20 = 2: 59d5548ddfa odb: introduce `odb_source_new()`
3: b6f9a3f6d26 = 3: a417bd30153 odb: adjust naming to free object sources
4: fe38c58ab20 = 4: add2b7d112f object-file: move `fetch_if_missing`
5: 5fed6e7c429 ! 5: 5245ba0c6a5 object-file: introduce `struct odb_loose_source`
@@ Metadata
Author: Patrick Steinhardt <ps@pks.im>
## Commit message ##
- object-file: introduce `struct odb_loose_source`
+ object-file: introduce `struct odb_source_loose`
Currently, all state that relates to loose objects is held directly by
- the `struct odb_source`. Introduce a new `struct odb_loose_source` to
+ the `struct odb_source`. Introduce a new `struct odb_source_loose` to
hold the state instead so that it is entirely self-contained.
This structure will eventually morph into the backend for accessing
@@ object-file.c: void object_file_transaction_commit(struct odb_transaction *trans
free(transaction);
}
+
-+struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
++struct odb_source_loose *odb_source_loose_new(struct odb_source *source)
+{
-+ struct odb_loose_source *loose;
++ struct odb_source_loose *loose;
+ CALLOC_ARRAY(loose, 1);
+ loose->source = source;
+ return loose;
+}
+
-+void odb_loose_source_free(struct odb_loose_source *loose)
++void odb_source_loose_free(struct odb_source_loose *loose)
+{
+ free(loose);
+}
@@ object-file.h: int index_path(struct index_state *istate, struct object_id *oid,
struct odb_source;
-+struct odb_loose_source {
++struct odb_source_loose {
+ struct odb_source *source;
+};
+
-+struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
-+void odb_loose_source_free(struct odb_loose_source *loose);
++struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
++void odb_source_loose_free(struct odb_source_loose *loose);
+
/*
* Populate and return the loose object cache array corresponding to the
@@ odb.c: struct odb_source *odb_source_new(struct object_database *odb,
source->odb = odb;
source->local = local;
source->path = xstrdup(path);
-+ source->loose = odb_loose_source_new(source);
++ source->loose = odb_source_loose_new(source);
return source;
}
@@ odb.c: struct odb_source *odb_set_temporary_primary_source(struct object_databas
static void odb_source_free(struct odb_source *source)
{
free(source->path);
-+ odb_loose_source_free(source->loose);
++ odb_source_loose_free(source->loose);
odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
@@ odb.h: struct odb_source {
struct object_database *odb;
+ /* Private state for loose objects. */
-+ struct odb_loose_source *loose;
++ struct odb_source_loose *loose;
+
/*
* Used to store the results of readdir(3) calls when we are OK
6: a66851a6afd ! 6: 3d931e4929d object-file: move loose object cache into loose source
@@ Commit message
Our loose objects use a cache that (optionally) stores all objects for
each of the opened sharding directories. This cache is located in the
- `struct odb_source`, but now that we have `struct odb_loose_source` it
+ `struct odb_source`, but now that we have `struct odb_source_loose` it
makes sense to move it into the latter structure so that all state that
relates to loose objects is entirely self-contained.
Do so. While at it, rename corresponding functions to have a prefix that
- relates to `struct odb_loose_source`.
+ relates to `struct odb_source_loose`.
Note that despite this prefix, the functions still accept a `struct
odb_source` as input. This is done intentionally: once we introduce
pluggable object databases, we will continue to accept this struct but
- then do a cast inside these functions to `struct odb_loose_source`. This
+ then do a cast inside these functions to `struct odb_source_loose`. This
design is similar to how we do it for our ref backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
@@ object-file.c: static int quick_has_loose(struct repository *r,
odb_prepare_alternates(r->objects);
for (source = r->objects->sources; source; source = source->next) {
- if (oidtree_contains(odb_loose_cache(source, oid), oid))
-+ if (oidtree_contains(odb_loose_source_cache(source, oid), oid))
++ if (oidtree_contains(odb_source_loose_cache(source, oid), oid))
return 1;
}
return 0;
@@ object-file.c: static int append_loose_object(const struct object_id *oid,
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid)
-+struct oidtree *odb_loose_source_cache(struct odb_source *source,
++struct oidtree *odb_source_loose_cache(struct odb_source *source,
+ const struct object_id *oid)
{
int subdir_nr = oid->hash[0];
@@ object-file.c: static int append_loose_object(const struct object_id *oid,
}
static int check_stream_oid(git_zstream *stream,
-@@ object-file.c: struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
+@@ object-file.c: struct odb_source_loose *odb_source_loose_new(struct odb_source *source)
- void odb_loose_source_free(struct odb_loose_source *loose)
+ void odb_source_loose_free(struct odb_source_loose *loose)
{
+ if (!loose)
+ return;
@@ object-file.c: struct odb_loose_source *odb_loose_source_new(struct odb_source *
## object-file.h ##
@@ object-file.h: struct odb_source;
- struct odb_loose_source {
+ struct odb_source_loose {
struct odb_source *source;
+
+ /*
@@ object-file.h: struct odb_source;
+ struct oidtree *cache;
};
- struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
-@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *loose);
+ struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
+@@ object-file.h: void odb_source_loose_free(struct odb_source_loose *loose);
* Populate and return the loose object cache array corresponding to the
* given object ID.
*/
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid);
-+struct oidtree *odb_loose_source_cache(struct odb_source *source,
++struct oidtree *odb_source_loose_cache(struct odb_source *source,
+ const struct object_id *oid);
/* Empty the loose object cache for the specified object directory. */
@@ object-name.c: static void find_short_object_filename(struct disambiguate_state
for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next)
- oidtree_each(odb_loose_cache(source, &ds->bin_pfx),
-+ oidtree_each(odb_loose_source_cache(source, &ds->bin_pfx),
++ oidtree_each(odb_source_loose_cache(source, &ds->bin_pfx),
&ds->bin_pfx, ds->len, match_prefix, ds);
}
@@ odb.c
@@ odb.c: static void odb_source_free(struct odb_source *source)
{
free(source->path);
- odb_loose_source_free(source->loose);
+ odb_source_loose_free(source->loose);
- odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
@@ odb.c: static void odb_source_free(struct odb_source *source)
## odb.h ##
@@ odb.h: struct odb_source {
/* Private state for loose objects. */
- struct odb_loose_source *loose;
+ struct odb_source_loose *loose;
- /*
- * Used to store the results of readdir(3) calls when we are OK
7: ac94974a4b4 ! 7: 6eb525fa775 object-file: hide internals when we need to reprepare loose sources
@@ Commit message
- When repreparing the loose object source so that any potentially-
stale data is getting evicted from the cache.
- The former is already handled by `odb_loose_source_free()`. But the
+ The former is already handled by `odb_source_loose_free()`. But the
latter case is still done manually by in `odb_reprepare()`, so we are
leaking internals into that code.
- Introduce a new `odb_loose_source_reprepare()` function as an equivalent
+ Introduce a new `odb_source_loose_reprepare()` function as an equivalent
to `packfile_store_prepare()` to hide these implementation details.
Furthermore, while at it, rename the function `odb_clear_loose_cache()`
- to `odb_loose_source_clear()`.
+ to `odb_source_loose_clear()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
## object-file.c ##
-@@ object-file.c: struct oidtree *odb_loose_source_cache(struct odb_source *source,
+@@ object-file.c: struct oidtree *odb_source_loose_cache(struct odb_source *source,
return source->loose->cache;
}
-void odb_clear_loose_cache(struct odb_source *source)
-+static void odb_loose_source_clear_cache(struct odb_loose_source *loose)
++static void odb_source_loose_clear_cache(struct odb_source_loose *loose)
{
- oidtree_clear(source->loose->cache);
- FREE_AND_NULL(source->loose->cache);
@@ object-file.c: struct oidtree *odb_loose_source_cache(struct odb_source *source,
+ sizeof(loose->subdir_seen));
+}
+
-+void odb_loose_source_reprepare(struct odb_source *source)
++void odb_source_loose_reprepare(struct odb_source *source)
+{
-+ odb_loose_source_clear_cache(source->loose);
++ odb_source_loose_clear_cache(source->loose);
}
static int check_stream_oid(git_zstream *stream,
-@@ object-file.c: void odb_loose_source_free(struct odb_loose_source *loose)
+@@ object-file.c: void odb_source_loose_free(struct odb_source_loose *loose)
{
if (!loose)
return;
- odb_clear_loose_cache(loose->source);
-+ odb_loose_source_clear_cache(loose);
++ odb_source_loose_clear_cache(loose);
free(loose);
}
## object-file.h ##
-@@ object-file.h: struct odb_loose_source {
- struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
- void odb_loose_source_free(struct odb_loose_source *loose);
+@@ object-file.h: struct odb_source_loose {
+ struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
+ void odb_source_loose_free(struct odb_source_loose *loose);
+/* Reprepare the loose source by emptying the loose object cache. */
-+void odb_loose_source_reprepare(struct odb_source *source);
++void odb_source_loose_reprepare(struct odb_source *source);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
-@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *loose);
- struct oidtree *odb_loose_source_cache(struct odb_source *source,
+@@ object-file.h: void odb_source_loose_free(struct odb_source_loose *loose);
+ struct oidtree *odb_source_loose_cache(struct odb_source *source,
const struct object_id *oid);
-/* Empty the loose object cache for the specified object directory. */
@@ odb.c: void odb_reprepare(struct object_database *o)
for (source = o->sources; source; source = source->next)
- odb_clear_loose_cache(source);
-+ odb_loose_source_reprepare(source);
++ odb_source_loose_reprepare(source);
o->approximate_object_count_valid = 0;
8: 8820c59b0d1 ! 8: f5aff5356f1 object-file: move loose object map into loose source
@@ loose.c: int repo_loose_object_map_oid(struct repository *repo,
map = (to == repo->compat_hash_algo) ?
## object-file.c ##
-@@ object-file.c: void odb_loose_source_free(struct odb_loose_source *loose)
+@@ object-file.c: void odb_source_loose_free(struct odb_source_loose *loose)
if (!loose)
return;
- odb_loose_source_clear_cache(loose);
+ odb_source_loose_clear_cache(loose);
+ loose_object_map_clear(&loose->map);
free(loose);
}
## object-file.h ##
-@@ object-file.h: struct odb_loose_source {
+@@ object-file.h: struct odb_source_loose {
*/
uint32_t subdir_seen[8]; /* 256 bits */
struct oidtree *cache;
@@ object-file.h: struct odb_loose_source {
+ struct loose_object_map *map;
};
- struct odb_loose_source *odb_loose_source_new(struct odb_source *source);
+ struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
## odb.c ##
@@ odb.c: static void odb_source_free(struct odb_source *source)
{
free(source->path);
- odb_loose_source_free(source->loose);
+ odb_source_loose_free(source->loose);
- loose_object_map_clear(&source->loose_map);
free(source);
}
@@ odb.c: static void odb_source_free(struct odb_source *source)
## odb.h ##
@@ odb.h: struct odb_source {
/* Private state for loose objects. */
- struct odb_loose_source *loose;
+ struct odb_source_loose *loose;
- /* Map between object IDs for loose objects. */
- struct loose_object_map *loose_map;
9: 0a549d59861 ! 9: 2e328d81928 object-file: read objects via the loose object source
@@ object-file.c: int stream_object_signature(struct repository *r, const struct ob
* call to stat_loose_object().
*/
-static int stat_loose_object(struct repository *r, const struct object_id *oid,
-+static int stat_loose_object(struct odb_loose_source *loose,
++static int stat_loose_object(struct odb_source_loose *loose,
+ const struct object_id *oid,
struct stat *st, const char **path)
{
@@ object-file.c: static int stat_loose_object(struct repository *r, const struct o
* descriptor. See the caveats on the "path" parameter above.
*/
-static int open_loose_object(struct repository *r,
-+static int open_loose_object(struct odb_loose_source *loose,
++static int open_loose_object(struct odb_source_loose *loose,
const struct object_id *oid, const char **path)
{
- int fd;
@@ object-file.c: static int stat_loose_object(struct repository *r, const struct o
}
-static int quick_has_loose(struct repository *r,
-+static int quick_has_loose(struct odb_loose_source *loose,
++static int quick_has_loose(struct odb_source_loose *loose,
const struct object_id *oid)
{
- struct odb_source *source;
-
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
-- if (oidtree_contains(odb_loose_source_cache(source, oid), oid))
+- if (oidtree_contains(odb_source_loose_cache(source, oid), oid))
- return 1;
- }
- return 0;
-+ return !!oidtree_contains(odb_loose_source_cache(loose->source, oid), oid);
++ return !!oidtree_contains(odb_source_loose_cache(loose->source, oid), oid);
}
/*
@@ object-file.c: static void *map_fd(int fd, const char *path, unsigned long *size
-void *map_loose_object(struct repository *r,
- const struct object_id *oid,
- unsigned long *size)
-+void *odb_loose_source_map_object(struct odb_source *source,
++void *odb_source_loose_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size)
{
@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
-int loose_object_info(struct repository *r,
- const struct object_id *oid,
- struct object_info *oi, int flags)
-+int odb_loose_source_read_object_info(struct odb_source *source,
++int odb_source_loose_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags)
{
@@ object-file.c: int loose_object_info(struct repository *r,
error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
## object-file.h ##
-@@ object-file.h: void odb_loose_source_free(struct odb_loose_source *loose);
+@@ object-file.h: void odb_source_loose_free(struct odb_source_loose *loose);
/* Reprepare the loose source by emptying the loose object cache. */
- void odb_loose_source_reprepare(struct odb_source *source);
+ void odb_source_loose_reprepare(struct odb_source *source);
-+int odb_loose_source_read_object_info(struct odb_source *source,
++int odb_source_loose_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags);
+
-+void *odb_loose_source_map_object(struct odb_source *source,
++void *odb_source_loose_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size);
+
@@ odb.c: static int do_oid_object_info_extended(struct object_database *odb,
- if (!loose_object_info(odb->repo, real, oi, flags))
- return 0;
+ for (source = odb->sources; source; source = source->next)
-+ if (!odb_loose_source_read_object_info(source, real, oi, flags))
++ if (!odb_source_loose_read_object_info(source, real, oi, flags))
+ return 0;
/* Not a loose object; someone else may have just packed it. */
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct reposi
- st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
+ odb_prepare_alternates(r->objects);
+ for (source = r->objects->sources; source; source = source->next) {
-+ st->u.loose.mapped = odb_loose_source_map_object(source, oid,
++ st->u.loose.mapped = odb_source_loose_map_object(source, oid,
+ &st->u.loose.mapsize);
+ if (st->u.loose.mapped)
+ break;
10: bcc58f6f7e5 ! 10: 29787a709b3 object-file: rename `has_loose_object()`
@@ Metadata
## Commit message ##
object-file: rename `has_loose_object()`
- Rename `has_loose_object()` to `odb_loose_source_has_object()` so that
+ Rename `has_loose_object()` to `odb_source_loose_has_object()` so that
it becomes clear that this is tied to a specific loose object source.
This matches our modern naming schema for functions.
@@ builtin/pack-objects.c: static int want_object_in_pack_mtime(const struct object
struct odb_source *source = the_repository->objects->sources->next;
for (; source; source = source->next)
- if (has_loose_object(source, oid))
-+ if (odb_loose_source_has_object(source, oid))
++ if (odb_source_loose_has_object(source, oid))
return 0;
}
@@ builtin/pack-objects.c: static void add_cruft_object_entry(const struct object_i
for (; !found && source; source = source->next)
- if (has_loose_object(source, oid))
-+ if (odb_loose_source_has_object(source, oid))
++ if (odb_source_loose_has_object(source, oid))
found = 1;
/*
@@ object-file.c: static int check_and_freshen_source(struct odb_source *source,
-int has_loose_object(struct odb_source *source,
- const struct object_id *oid)
-+int odb_loose_source_has_object(struct odb_source *source,
++int odb_source_loose_has_object(struct odb_source *source,
+ const struct object_id *oid)
{
return check_and_freshen_source(source, oid, 0);
@@ object-file.c: int force_object_loose(struct odb_source *source,
for (struct odb_source *s = source->odb->sources; s; s = s->next)
- if (has_loose_object(s, oid))
-+ if (odb_loose_source_has_object(s, oid))
++ if (odb_source_loose_has_object(s, oid))
return 0;
oi.typep = &type;
## object-file.h ##
-@@ object-file.h: void *odb_loose_source_map_object(struct odb_source *source,
+@@ object-file.h: void *odb_source_loose_map_object(struct odb_source *source,
const struct object_id *oid,
unsigned long *size);
@@ object-file.h: void *odb_loose_source_map_object(struct odb_source *source,
+ * with the specified name. This function does not respect replace
+ * references.
+ */
-+int odb_loose_source_has_object(struct odb_source *source,
++int odb_source_loose_has_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
11: 60a5e4882cd ! 11: 3b95f712fb4 object-file: refactor freshening of objects
@@ Commit message
Introduce a new `odb_freshen_object()` function that sits on the object
database level and two functions `packfile_store_freshen_object()` and
- `odb_loose_source_freshen_object()`. Like this, the format-specific
+ `odb_source_loose_freshen_object()`. Like this, the format-specific
functions can be part of their respective subsystems, while the backend
agnostic function to freshen an object sits at the object database
layer.
@@ object-file.c: static int write_loose_object(struct odb_source *source,
-static int freshen_loose_object(struct object_database *odb,
- const struct object_id *oid)
-+int odb_loose_source_freshen_object(struct odb_source *source,
++int odb_source_loose_freshen_object(struct odb_source *source,
+ const struct object_id *oid)
{
- odb_prepare_alternates(odb);
@@ object-file.c: int write_object_file(struct odb_source *source,
return -1;
## object-file.h ##
-@@ object-file.h: void *odb_loose_source_map_object(struct odb_source *source,
- int odb_loose_source_has_object(struct odb_source *source,
+@@ object-file.h: void *odb_source_loose_map_object(struct odb_source *source,
+ int odb_source_loose_has_object(struct odb_source *source,
const struct object_id *oid);
-+int odb_loose_source_freshen_object(struct odb_source *source,
++int odb_source_loose_freshen_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
@@ odb.c: int odb_has_object(struct object_database *odb, const struct object_id *o
+
+ odb_prepare_alternates(odb);
+ for (source = odb->sources; source; source = source->next)
-+ if (odb_loose_source_freshen_object(source, oid))
++ if (odb_source_loose_freshen_object(source, oid))
+ return 1;
+
+ return 0;
12: d31eec55aec ! 12: 31970221960 object-file: rename `write_object_file()`
@@ Metadata
## Commit message ##
object-file: rename `write_object_file()`
- Rename `write_object_file()` to `odb_loose_source_write_object()` so
+ Rename `write_object_file()` to `odb_source_loose_write_object()` so
that it becomes clear that this is tied to a specific loose object
source. This matches our modern naming schema for functions.
@@ object-file.c: int stream_loose_object(struct odb_source *source,
- const void *buf, unsigned long len,
- enum object_type type, struct object_id *oid,
- struct object_id *compat_oid_in, unsigned flags)
-+int odb_loose_source_write_object(struct odb_source *source,
++int odb_source_loose_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags)
@@ object-file.c: int stream_loose_object(struct odb_source *source,
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
## object-file.h ##
-@@ object-file.h: int odb_loose_source_has_object(struct odb_source *source,
- int odb_loose_source_freshen_object(struct odb_source *source,
+@@ object-file.h: int odb_source_loose_has_object(struct odb_source *source,
+ int odb_source_loose_freshen_object(struct odb_source *source,
const struct object_id *oid);
-+int odb_loose_source_write_object(struct odb_source *source,
++int odb_source_loose_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags);
@@ odb.c: int odb_write_object_ext(struct object_database *odb,
unsigned flags)
{
- return write_object_file(odb->sources, buf, len, type, oid, compat_oid, flags);
-+ return odb_loose_source_write_object(odb->sources, buf, len, type,
++ return odb_source_loose_write_object(odb->sources, buf, len, type,
+ oid, compat_oid, flags);
}
13: 84fbe7758d0 ! 13: cec069fd2df object-file: refactor writing objects via a stream
@@ Commit message
have to be aware of the inner workings of how to stream an object to
disk with a specific object source.
- Rename `stream_loose_object()` to `odb_loose_source_write_stream()` to
+ Rename `stream_loose_object()` to `odb_source_loose_write_stream()` to
clarify its scope. This matches our modern best practices around how to
name functions.
@@ builtin/unpack-objects.c: static void stream_blob(unsigned long size, unsigned n
if (data.status != Z_STREAM_END)
## object-file.c ##
-@@ object-file.c: int odb_loose_source_freshen_object(struct odb_source *source,
+@@ object-file.c: int odb_source_loose_freshen_object(struct odb_source *source,
return !!check_and_freshen_source(source, oid, 1);
}
-int stream_loose_object(struct odb_source *source,
- struct input_stream *in_stream, size_t len,
- struct object_id *oid)
-+int odb_loose_source_write_stream(struct odb_source *source,
++int odb_source_loose_write_stream(struct odb_source *source,
+ struct odb_write_stream *in_stream, size_t len,
+ struct object_id *oid)
{
@@ object-file.c: int odb_loose_source_freshen_object(struct odb_source *source,
struct object_id compat_oid;
## object-file.h ##
-@@ object-file.h: int odb_loose_source_write_object(struct odb_source *source,
+@@ object-file.h: int odb_source_loose_write_object(struct odb_source *source,
enum object_type type, struct object_id *oid,
struct object_id *compat_oid_in, unsigned flags);
-+int odb_loose_source_write_stream(struct odb_source *source,
++int odb_source_loose_write_stream(struct odb_source *source,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid);
+
@@ odb.c: int odb_write_object_ext(struct object_database *odb,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid)
+{
-+ return odb_loose_source_write_stream(odb->sources, stream, len, oid);
++ return odb_source_loose_write_stream(odb->sources, stream, len, oid);
+}
+
struct object_database *odb_new(struct repository *repo)
---
base-commit: c54a18ef67e59cdbcd77d6294916d42c98c62d1d
change-id: 20251017-b4-pks-odb-loose-backend-b1e003c41107
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3 01/13] odb: fix subtle logic to check whether an alternate is usable
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
@ 2025-11-03 7:41 ` Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
` (12 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:41 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
When adding an alternate to the object database we first check whether
or not the path is usable. A path is usable if:
- It actually exists.
- We don't have it in our object sources yet.
While the former check is trivial enough, the latter part is somewhat
subtle and prone for bugs. This is because the function doesn't only
check whether or not the given path is usable. But if it _is_ usable, we
also store that path in the map of object sources immediately.
The tricky part here is that the path that gets stored in the map is
_not_ copied. Instead, we rely on the fact that subsequent code uses
`strbuf_detach()` to store the exact same allocated memory in the
created object source. Consequently, the memory is owned by the source
but _also_ stored in the map. This subtlety is easy to miss, so if one
decides to refactor this code one can easily end up breaking this
mechanism.
Make the relationship more explicit by not storing the path as part of
`alt_odb_usable()`. Instead, store the path after we have created the
source so that we can use the source's path pointer directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/odb.c b/odb.c
index 00a6e71568b..57d85ed9505 100644
--- a/odb.c
+++ b/odb.c
@@ -86,17 +86,16 @@ int odb_mkstemp(struct object_database *odb,
/*
* Return non-zero iff the path is usable as an alternate object database.
*/
-static int alt_odb_usable(struct object_database *o,
- struct strbuf *path,
- const char *normalized_objdir, khiter_t *pos)
+static int alt_odb_usable(struct object_database *o, const char *path,
+ const char *normalized_objdir)
{
int r;
/* Detect cases where alternate disappeared */
- if (!is_directory(path->buf)) {
+ if (!is_directory(path)) {
error(_("object directory %s does not exist; "
"check .git/objects/info/alternates"),
- path->buf);
+ path);
return 0;
}
@@ -113,11 +112,14 @@ static int alt_odb_usable(struct object_database *o,
assert(r == 1); /* never used */
kh_value(o->source_by_path, p) = o->sources;
}
- if (fspatheq(path->buf, normalized_objdir))
+
+ if (fspatheq(path, normalized_objdir))
+ return 0;
+
+ if (kh_get_odb_path_map(o->source_by_path, path) < kh_end(o->source_by_path))
return 0;
- *pos = kh_put_odb_path_map(o->source_by_path, path->buf, &r);
- /* r: 0 = exists, 1 = never used, 2 = deleted */
- return r == 0 ? 0 : 1;
+
+ return 1;
}
/*
@@ -148,6 +150,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
struct strbuf pathbuf = STRBUF_INIT;
struct strbuf tmp = STRBUF_INIT;
khiter_t pos;
+ int ret;
if (!is_absolute_path(dir) && relative_base) {
strbuf_realpath(&pathbuf, relative_base, 1);
@@ -172,20 +175,21 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
strbuf_reset(&tmp);
strbuf_realpath(&tmp, odb->sources->path, 1);
- if (!alt_odb_usable(odb, &pathbuf, tmp.buf, &pos))
+ if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
goto error;
CALLOC_ARRAY(alternate, 1);
alternate->odb = odb;
alternate->local = false;
- /* pathbuf.buf is already in r->objects->source_by_path */
alternate->path = strbuf_detach(&pathbuf, NULL);
/* add the alternate entry */
*odb->sources_tail = alternate;
odb->sources_tail = &(alternate->next);
- alternate->next = NULL;
- assert(odb->source_by_path);
+
+ pos = kh_put_odb_path_map(odb->source_by_path, alternate->path, &ret);
+ if (!ret)
+ BUG("source must not yet exist");
kh_value(odb->source_by_path, pos) = alternate;
/* recursively add alternates */
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 02/13] odb: introduce `odb_source_new()`
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
@ 2025-11-03 7:41 ` Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 03/13] odb: adjust naming to free object sources Patrick Steinhardt
` (11 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:41 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
We have three different locations where we create a new ODB source.
Deduplicate the logic via a new `odb_source_new()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 23 ++++++++++++++++-------
odb.h | 4 ++++
repository.c | 14 +++++++++-----
3 files changed, 29 insertions(+), 12 deletions(-)
diff --git a/odb.c b/odb.c
index 57d85ed9505..d2d4c514ae5 100644
--- a/odb.c
+++ b/odb.c
@@ -141,6 +141,20 @@ static void read_info_alternates(struct object_database *odb,
const char *relative_base,
int depth);
+struct odb_source *odb_source_new(struct object_database *odb,
+ const char *path,
+ bool local)
+{
+ struct odb_source *source;
+
+ CALLOC_ARRAY(source, 1);
+ source->odb = odb;
+ source->local = local;
+ source->path = xstrdup(path);
+
+ return source;
+}
+
static struct odb_source *link_alt_odb_entry(struct object_database *odb,
const char *dir,
const char *relative_base,
@@ -178,10 +192,7 @@ static struct odb_source *link_alt_odb_entry(struct object_database *odb,
if (!alt_odb_usable(odb, pathbuf.buf, tmp.buf))
goto error;
- CALLOC_ARRAY(alternate, 1);
- alternate->odb = odb;
- alternate->local = false;
- alternate->path = strbuf_detach(&pathbuf, NULL);
+ alternate = odb_source_new(odb, pathbuf.buf, false);
/* add the alternate entry */
*odb->sources_tail = alternate;
@@ -341,9 +352,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
* Make a new primary odb and link the old primary ODB in as an
* alternate
*/
- source = xcalloc(1, sizeof(*source));
- source->odb = odb;
- source->path = xstrdup(dir);
+ source = odb_source_new(odb, dir, false);
/*
* Disable ref updates while a temporary odb is active, since
diff --git a/odb.h b/odb.h
index e6602dd90c8..2bec895d135 100644
--- a/odb.h
+++ b/odb.h
@@ -89,6 +89,10 @@ struct odb_source {
char *path;
};
+struct odb_source *odb_source_new(struct object_database *odb,
+ const char *path,
+ bool local);
+
struct packed_git;
struct packfile_store;
struct cached_object_entry;
diff --git a/repository.c b/repository.c
index 6faf5c73981..6aaa7ba0086 100644
--- a/repository.c
+++ b/repository.c
@@ -160,20 +160,24 @@ void repo_set_gitdir(struct repository *repo,
* until after xstrdup(root). Then we can free it.
*/
char *old_gitdir = repo->gitdir;
+ char *objects_path = NULL;
repo->gitdir = xstrdup(gitfile ? gitfile : root);
free(old_gitdir);
repo_set_commondir(repo, o->commondir);
+ expand_base_dir(&objects_path, o->object_dir,
+ repo->commondir, "objects");
if (!repo->objects->sources) {
- CALLOC_ARRAY(repo->objects->sources, 1);
- repo->objects->sources->odb = repo->objects;
- repo->objects->sources->local = true;
+ repo->objects->sources = odb_source_new(repo->objects,
+ objects_path, true);
repo->objects->sources_tail = &repo->objects->sources->next;
+ free(objects_path);
+ } else {
+ free(repo->objects->sources->path);
+ repo->objects->sources->path = objects_path;
}
- expand_base_dir(&repo->objects->sources->path, o->object_dir,
- repo->commondir, "objects");
repo->objects->sources->disable_ref_updates = o->disable_ref_updates;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 03/13] odb: adjust naming to free object sources
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
@ 2025-11-03 7:41 ` Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
` (10 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:41 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
The functions `free_object_directory()` and `free_object_directories()`
are responsible for freeing a single object source or all object sources
connected to an object database, respectively. The associated structure
has been renamed from `struct object_directory` to `struct odb_source`
in a1e2581a1e (object-store: rename `object_directory` to `odb_source`,
2025-07-01) though, so the names are somewhat stale nowadays.
Rename them to mention the new struct name instead. Furthermore, while
at it, adapt them to our modern naming schema where we first have the
subject followed by a verb.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/odb.c b/odb.c
index d2d4c514ae5..77490d7fdbe 100644
--- a/odb.c
+++ b/odb.c
@@ -365,7 +365,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
return source->next;
}
-static void free_object_directory(struct odb_source *source)
+static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_clear_loose_cache(source);
@@ -387,7 +387,7 @@ void odb_restore_primary_source(struct object_database *odb,
BUG("we expect the old primary object store to be the first alternate");
odb->sources = restore_source;
- free_object_directory(cur_source);
+ odb_source_free(cur_source);
}
char *compute_alternate_path(const char *path, struct strbuf *err)
@@ -1015,13 +1015,13 @@ struct object_database *odb_new(struct repository *repo)
return o;
}
-static void free_object_directories(struct object_database *o)
+static void odb_free_sources(struct object_database *o)
{
while (o->sources) {
struct odb_source *next;
next = o->sources->next;
- free_object_directory(o->sources);
+ odb_source_free(o->sources);
o->sources = next;
}
kh_destroy_odb_path_map(o->source_by_path);
@@ -1039,7 +1039,7 @@ void odb_clear(struct object_database *o)
o->commit_graph = NULL;
o->commit_graph_attempted = 0;
- free_object_directories(o);
+ odb_free_sources(o);
o->sources_tail = NULL;
o->loaded_alternates = 0;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 04/13] object-file: move `fetch_if_missing`
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (2 preceding siblings ...)
2025-11-03 7:41 ` [PATCH v3 03/13] odb: adjust naming to free object sources Patrick Steinhardt
@ 2025-11-03 7:41 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 05/13] object-file: introduce `struct odb_source_loose` Patrick Steinhardt
` (9 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:41 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
The `fetch_if_missing` global variable is declared in "object-file.h"
but defined in "odb.c". The variable relates to the whole object
database instead of only loose objects, so move the declaration into
"odb.h" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.h | 8 --------
odb.h | 8 ++++++++
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/object-file.h b/object-file.h
index 3fd48dcafbf..097e9764be1 100644
--- a/object-file.h
+++ b/object-file.h
@@ -7,14 +7,6 @@
struct index_state;
-/*
- * Set this to 0 to prevent odb_read_object_info_extended() from fetching missing
- * blobs. This has a difference only if extensions.partialClone is set.
- *
- * Its default value is 1.
- */
-extern int fetch_if_missing;
-
enum {
INDEX_WRITE_OBJECT = (1 << 0),
INDEX_FORMAT_CHECK = (1 << 1),
diff --git a/odb.h b/odb.h
index 2bec895d135..2346ffeca85 100644
--- a/odb.h
+++ b/odb.h
@@ -14,6 +14,14 @@ struct strbuf;
struct repository;
struct multi_pack_index;
+/*
+ * Set this to 0 to prevent odb_read_object_info_extended() from fetching missing
+ * blobs. This has a difference only if extensions.partialClone is set.
+ *
+ * Its default value is 1.
+ */
+extern int fetch_if_missing;
+
/*
* Compute the exact path an alternate is at and returns it. In case of
* error NULL is returned and the human readable error is added to `err`
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 05/13] object-file: introduce `struct odb_source_loose`
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (3 preceding siblings ...)
2025-11-03 7:41 ` [PATCH v3 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
` (8 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Currently, all state that relates to loose objects is held directly by
the `struct odb_source`. Introduce a new `struct odb_source_loose` to
hold the state instead so that it is entirely self-contained.
This structure will eventually morph into the backend for accessing
loose objects. As such, this is part of the refactorings to introduce
pluggable object databases.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 13 +++++++++++++
object-file.h | 7 +++++++
odb.c | 2 ++
odb.h | 3 +++
4 files changed, 25 insertions(+)
diff --git a/object-file.c b/object-file.c
index 4675c8ed6b6..cd6aa561fa7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1995,3 +1995,16 @@ void object_file_transaction_commit(struct odb_transaction *transaction)
transaction->odb->transaction = NULL;
free(transaction);
}
+
+struct odb_source_loose *odb_source_loose_new(struct odb_source *source)
+{
+ struct odb_source_loose *loose;
+ CALLOC_ARRAY(loose, 1);
+ loose->source = source;
+ return loose;
+}
+
+void odb_source_loose_free(struct odb_source_loose *loose)
+{
+ free(loose);
+}
diff --git a/object-file.h b/object-file.h
index 097e9764be1..695a7e8e7c4 100644
--- a/object-file.h
+++ b/object-file.h
@@ -18,6 +18,13 @@ int index_path(struct index_state *istate, struct object_id *oid, const char *pa
struct odb_source;
+struct odb_source_loose {
+ struct odb_source *source;
+};
+
+struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
+void odb_source_loose_free(struct odb_source_loose *loose);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
diff --git a/odb.c b/odb.c
index 77490d7fdbe..2d06ab0bb85 100644
--- a/odb.c
+++ b/odb.c
@@ -151,6 +151,7 @@ struct odb_source *odb_source_new(struct object_database *odb,
source->odb = odb;
source->local = local;
source->path = xstrdup(path);
+ source->loose = odb_source_loose_new(source);
return source;
}
@@ -368,6 +369,7 @@ struct odb_source *odb_set_temporary_primary_source(struct object_database *odb,
static void odb_source_free(struct odb_source *source)
{
free(source->path);
+ odb_source_loose_free(source->loose);
odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
diff --git a/odb.h b/odb.h
index 2346ffeca85..49b398bedae 100644
--- a/odb.h
+++ b/odb.h
@@ -48,6 +48,9 @@ struct odb_source {
/* Object database that owns this object source. */
struct object_database *odb;
+ /* Private state for loose objects. */
+ struct odb_source_loose *loose;
+
/*
* Used to store the results of readdir(3) calls when we are OK
* sacrificing accuracy due to races for speed. That includes
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 06/13] object-file: move loose object cache into loose source
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (4 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 05/13] object-file: introduce `struct odb_source_loose` Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
` (7 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Our loose objects use a cache that (optionally) stores all objects for
each of the opened sharding directories. This cache is located in the
`struct odb_source`, but now that we have `struct odb_source_loose` it
makes sense to move it into the latter structure so that all state that
relates to loose objects is entirely self-contained.
Do so. While at it, rename corresponding functions to have a prefix that
relates to `struct odb_source_loose`.
Note that despite this prefix, the functions still accept a `struct
odb_source` as input. This is done intentionally: once we introduce
pluggable object databases, we will continue to accept this struct but
then do a cast inside these functions to `struct odb_source_loose`. This
design is similar to how we do it for our ref backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
loose.c | 9 +++++----
object-file.c | 35 +++++++++++++++++++----------------
object-file.h | 16 ++++++++++++++--
object-name.c | 2 +-
odb.c | 1 -
odb.h | 12 ------------
6 files changed, 39 insertions(+), 36 deletions(-)
diff --git a/loose.c b/loose.c
index e8ea6e7e24b..8cc7573ff2b 100644
--- a/loose.c
+++ b/loose.c
@@ -1,6 +1,7 @@
#include "git-compat-util.h"
#include "hash.h"
#include "path.h"
+#include "object-file.h"
#include "odb.h"
#include "hex.h"
#include "repository.h"
@@ -54,7 +55,7 @@ static int insert_loose_map(struct odb_source *source,
inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
if (inserted)
- oidtree_insert(source->loose_objects_cache, compat_oid);
+ oidtree_insert(source->loose->cache, compat_oid);
return inserted;
}
@@ -66,9 +67,9 @@ static int load_one_loose_object_map(struct repository *repo, struct odb_source
if (!source->loose_map)
loose_object_map_init(&source->loose_map);
- if (!source->loose_objects_cache) {
- ALLOC_ARRAY(source->loose_objects_cache, 1);
- oidtree_init(source->loose_objects_cache);
+ if (!source->loose->cache) {
+ ALLOC_ARRAY(source->loose->cache, 1);
+ oidtree_init(source->loose->cache);
}
insert_loose_map(source, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
diff --git a/object-file.c b/object-file.c
index cd6aa561fa7..fef00d6d3d0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -223,7 +223,7 @@ static int quick_has_loose(struct repository *r,
odb_prepare_alternates(r->objects);
for (source = r->objects->sources; source; source = source->next) {
- if (oidtree_contains(odb_loose_cache(source, oid), oid))
+ if (oidtree_contains(odb_source_loose_cache(source, oid), oid))
return 1;
}
return 0;
@@ -1802,44 +1802,44 @@ static int append_loose_object(const struct object_id *oid,
return 0;
}
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid)
+struct oidtree *odb_source_loose_cache(struct odb_source *source,
+ const struct object_id *oid)
{
int subdir_nr = oid->hash[0];
struct strbuf buf = STRBUF_INIT;
- size_t word_bits = bitsizeof(source->loose_objects_subdir_seen[0]);
+ size_t word_bits = bitsizeof(source->loose->subdir_seen[0]);
size_t word_index = subdir_nr / word_bits;
size_t mask = (size_t)1u << (subdir_nr % word_bits);
uint32_t *bitmap;
if (subdir_nr < 0 ||
- (size_t) subdir_nr >= bitsizeof(source->loose_objects_subdir_seen))
+ (size_t) subdir_nr >= bitsizeof(source->loose->subdir_seen))
BUG("subdir_nr out of range");
- bitmap = &source->loose_objects_subdir_seen[word_index];
+ bitmap = &source->loose->subdir_seen[word_index];
if (*bitmap & mask)
- return source->loose_objects_cache;
- if (!source->loose_objects_cache) {
- ALLOC_ARRAY(source->loose_objects_cache, 1);
- oidtree_init(source->loose_objects_cache);
+ return source->loose->cache;
+ if (!source->loose->cache) {
+ ALLOC_ARRAY(source->loose->cache, 1);
+ oidtree_init(source->loose->cache);
}
strbuf_addstr(&buf, source->path);
for_each_file_in_obj_subdir(subdir_nr, &buf,
source->odb->repo->hash_algo,
append_loose_object,
NULL, NULL,
- source->loose_objects_cache);
+ source->loose->cache);
*bitmap |= mask;
strbuf_release(&buf);
- return source->loose_objects_cache;
+ return source->loose->cache;
}
void odb_clear_loose_cache(struct odb_source *source)
{
- oidtree_clear(source->loose_objects_cache);
- FREE_AND_NULL(source->loose_objects_cache);
- memset(&source->loose_objects_subdir_seen, 0,
- sizeof(source->loose_objects_subdir_seen));
+ oidtree_clear(source->loose->cache);
+ FREE_AND_NULL(source->loose->cache);
+ memset(&source->loose->subdir_seen, 0,
+ sizeof(source->loose->subdir_seen));
}
static int check_stream_oid(git_zstream *stream,
@@ -2006,5 +2006,8 @@ struct odb_source_loose *odb_source_loose_new(struct odb_source *source)
void odb_source_loose_free(struct odb_source_loose *loose)
{
+ if (!loose)
+ return;
+ odb_clear_loose_cache(loose->source);
free(loose);
}
diff --git a/object-file.h b/object-file.h
index 695a7e8e7c4..90da69cf5f7 100644
--- a/object-file.h
+++ b/object-file.h
@@ -20,6 +20,18 @@ struct odb_source;
struct odb_source_loose {
struct odb_source *source;
+
+ /*
+ * Used to store the results of readdir(3) calls when we are OK
+ * sacrificing accuracy due to races for speed. That includes
+ * object existence with OBJECT_INFO_QUICK, as well as
+ * our search for unique abbreviated hashes. Don't use it for tasks
+ * requiring greater accuracy!
+ *
+ * Be sure to call odb_load_loose_cache() before using.
+ */
+ uint32_t subdir_seen[8]; /* 256 bits */
+ struct oidtree *cache;
};
struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
@@ -29,8 +41,8 @@ void odb_source_loose_free(struct odb_source_loose *loose);
* Populate and return the loose object cache array corresponding to the
* given object ID.
*/
-struct oidtree *odb_loose_cache(struct odb_source *source,
- const struct object_id *oid);
+struct oidtree *odb_source_loose_cache(struct odb_source *source,
+ const struct object_id *oid);
/* Empty the loose object cache for the specified object directory. */
void odb_clear_loose_cache(struct odb_source *source);
diff --git a/object-name.c b/object-name.c
index f6902e140dd..ba12ebf16f5 100644
--- a/object-name.c
+++ b/object-name.c
@@ -116,7 +116,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
struct odb_source *source;
for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next)
- oidtree_each(odb_loose_cache(source, &ds->bin_pfx),
+ oidtree_each(odb_source_loose_cache(source, &ds->bin_pfx),
&ds->bin_pfx, ds->len, match_prefix, ds);
}
diff --git a/odb.c b/odb.c
index 2d06ab0bb85..87d84688c63 100644
--- a/odb.c
+++ b/odb.c
@@ -370,7 +370,6 @@ static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_source_loose_free(source->loose);
- odb_clear_loose_cache(source);
loose_object_map_clear(&source->loose_map);
free(source);
}
diff --git a/odb.h b/odb.h
index 49b398bedae..77104396afe 100644
--- a/odb.h
+++ b/odb.h
@@ -51,18 +51,6 @@ struct odb_source {
/* Private state for loose objects. */
struct odb_source_loose *loose;
- /*
- * Used to store the results of readdir(3) calls when we are OK
- * sacrificing accuracy due to races for speed. That includes
- * object existence with OBJECT_INFO_QUICK, as well as
- * our search for unique abbreviated hashes. Don't use it for tasks
- * requiring greater accuracy!
- *
- * Be sure to call odb_load_loose_cache() before using.
- */
- uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
- struct oidtree *loose_objects_cache;
-
/* Map between object IDs for loose objects. */
struct loose_object_map *loose_map;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 07/13] object-file: hide internals when we need to reprepare loose sources
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (5 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 08/13] object-file: move loose object map into loose source Patrick Steinhardt
` (6 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
There are two different situations where we have to clear the cache of
loose objects:
- When freeing the loose object source itself to avoid memory leaks.
- When repreparing the loose object source so that any potentially-
stale data is getting evicted from the cache.
The former is already handled by `odb_source_loose_free()`. But the
latter case is still done manually by in `odb_reprepare()`, so we are
leaking internals into that code.
Introduce a new `odb_source_loose_reprepare()` function as an equivalent
to `packfile_store_prepare()` to hide these implementation details.
Furthermore, while at it, rename the function `odb_clear_loose_cache()`
to `odb_source_loose_clear()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 17 +++++++++++------
object-file.h | 6 +++---
odb.c | 2 +-
3 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/object-file.c b/object-file.c
index fef00d6d3d..20daa629a1 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1834,12 +1834,17 @@ struct oidtree *odb_source_loose_cache(struct odb_source *source,
return source->loose->cache;
}
-void odb_clear_loose_cache(struct odb_source *source)
+static void odb_source_loose_clear_cache(struct odb_source_loose *loose)
{
- oidtree_clear(source->loose->cache);
- FREE_AND_NULL(source->loose->cache);
- memset(&source->loose->subdir_seen, 0,
- sizeof(source->loose->subdir_seen));
+ oidtree_clear(loose->cache);
+ FREE_AND_NULL(loose->cache);
+ memset(&loose->subdir_seen, 0,
+ sizeof(loose->subdir_seen));
+}
+
+void odb_source_loose_reprepare(struct odb_source *source)
+{
+ odb_source_loose_clear_cache(source->loose);
}
static int check_stream_oid(git_zstream *stream,
@@ -2008,6 +2013,6 @@ void odb_source_loose_free(struct odb_source_loose *loose)
{
if (!loose)
return;
- odb_clear_loose_cache(loose->source);
+ odb_source_loose_clear_cache(loose);
free(loose);
}
diff --git a/object-file.h b/object-file.h
index 90da69cf5f..bec855e8e5 100644
--- a/object-file.h
+++ b/object-file.h
@@ -37,6 +37,9 @@ struct odb_source_loose {
struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
void odb_source_loose_free(struct odb_source_loose *loose);
+/* Reprepare the loose source by emptying the loose object cache. */
+void odb_source_loose_reprepare(struct odb_source *source);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -44,9 +47,6 @@ void odb_source_loose_free(struct odb_source_loose *loose);
struct oidtree *odb_source_loose_cache(struct odb_source *source,
const struct object_id *oid);
-/* Empty the loose object cache for the specified object directory. */
-void odb_clear_loose_cache(struct odb_source *source);
-
/*
* Put in `buf` the name of the file in the local object database that
* would be used to store a loose object with the specified oid.
diff --git a/odb.c b/odb.c
index 87d84688c6..b3e8d4a49c 100644
--- a/odb.c
+++ b/odb.c
@@ -1071,7 +1071,7 @@ void odb_reprepare(struct object_database *o)
odb_prepare_alternates(o);
for (source = o->sources; source; source = source->next)
- odb_clear_loose_cache(source);
+ odb_source_loose_reprepare(source);
o->approximate_object_count_valid = 0;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 08/13] object-file: move loose object map into loose source
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (6 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 09/13] object-file: read objects via the loose object source Patrick Steinhardt
` (5 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
The loose object map is used to map from the repository's canonical
object hash to the compatibility hash. As the name indicates, this map
is only used for loose objects, and as such it is tied to a specific
loose object source.
Same as with preceding commits, move this map into the loose object
source accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
loose.c | 10 +++++-----
object-file.c | 1 +
object-file.h | 3 +++
odb.c | 1 -
odb.h | 3 ---
5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/loose.c b/loose.c
index 8cc7573ff2b..56cf64b648b 100644
--- a/loose.c
+++ b/loose.c
@@ -49,7 +49,7 @@ static int insert_loose_map(struct odb_source *source,
const struct object_id *oid,
const struct object_id *compat_oid)
{
- struct loose_object_map *map = source->loose_map;
+ struct loose_object_map *map = source->loose->map;
int inserted = 0;
inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
@@ -65,8 +65,8 @@ static int load_one_loose_object_map(struct repository *repo, struct odb_source
struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
FILE *fp;
- if (!source->loose_map)
- loose_object_map_init(&source->loose_map);
+ if (!source->loose->map)
+ loose_object_map_init(&source->loose->map);
if (!source->loose->cache) {
ALLOC_ARRAY(source->loose->cache, 1);
oidtree_init(source->loose->cache);
@@ -125,7 +125,7 @@ int repo_read_loose_object_map(struct repository *repo)
int repo_write_loose_object_map(struct repository *repo)
{
- kh_oid_map_t *map = repo->objects->sources->loose_map->to_compat;
+ kh_oid_map_t *map = repo->objects->sources->loose->map->to_compat;
struct lock_file lock;
int fd;
khiter_t iter;
@@ -231,7 +231,7 @@ int repo_loose_object_map_oid(struct repository *repo,
khiter_t pos;
for (source = repo->objects->sources; source; source = source->next) {
- struct loose_object_map *loose_map = source->loose_map;
+ struct loose_object_map *loose_map = source->loose->map;
if (!loose_map)
continue;
map = (to == repo->compat_hash_algo) ?
diff --git a/object-file.c b/object-file.c
index 20daa629a1d..ccc67713fad 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2014,5 +2014,6 @@ void odb_source_loose_free(struct odb_source_loose *loose)
if (!loose)
return;
odb_source_loose_clear_cache(loose);
+ loose_object_map_clear(&loose->map);
free(loose);
}
diff --git a/object-file.h b/object-file.h
index bec855e8e53..f8a96a45f57 100644
--- a/object-file.h
+++ b/object-file.h
@@ -32,6 +32,9 @@ struct odb_source_loose {
*/
uint32_t subdir_seen[8]; /* 256 bits */
struct oidtree *cache;
+
+ /* Map between object IDs for loose objects. */
+ struct loose_object_map *map;
};
struct odb_source_loose *odb_source_loose_new(struct odb_source *source);
diff --git a/odb.c b/odb.c
index b3e8d4a49cb..d1df9609e21 100644
--- a/odb.c
+++ b/odb.c
@@ -370,7 +370,6 @@ static void odb_source_free(struct odb_source *source)
{
free(source->path);
odb_source_loose_free(source->loose);
- loose_object_map_clear(&source->loose_map);
free(source);
}
diff --git a/odb.h b/odb.h
index 77104396afe..f9a3137a34a 100644
--- a/odb.h
+++ b/odb.h
@@ -51,9 +51,6 @@ struct odb_source {
/* Private state for loose objects. */
struct odb_source_loose *loose;
- /* Map between object IDs for loose objects. */
- struct loose_object_map *loose_map;
-
/*
* private data
*
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 09/13] object-file: read objects via the loose object source
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (7 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 08/13] object-file: move loose object map into loose source Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
` (4 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
When reading an object via `loose_object_info()` or `map_loose_object()`
we hand in the whole repository. We then iterate through each of the
object sources to figure out whether that source has the object in
question.
This logic is reversing responsibility though: a specific backend should
only care about one specific source, where the object sources themselves
are then managed by the object database.
Refactor the code accordingly by passing an object source to both of
these functions instead. The different sources are then handled by
either `do_oid_object_info_extended()`, which sits on the object
database level, and by `open_istream_loose()`. The latter function
arguably is still at the wrong level, but this will be cleaned up at a
later point in time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 68 ++++++++++++++++++++++-------------------------------------
object-file.h | 15 +++++++------
odb.c | 9 ++++++--
streaming.c | 11 +++++++++-
4 files changed, 50 insertions(+), 53 deletions(-)
diff --git a/object-file.c b/object-file.c
index ccc67713fad..6d6e9a5a2ad 100644
--- a/object-file.c
+++ b/object-file.c
@@ -167,25 +167,22 @@ int stream_object_signature(struct repository *r, const struct object_id *oid)
}
/*
- * Find "oid" as a loose object in the local repository or in an alternate.
+ * Find "oid" as a loose object in given source.
* Returns 0 on success, negative on failure.
*
* The "path" out-parameter will give the path of the object we found (if any).
* Note that it may point to static storage and is only valid until another
* call to stat_loose_object().
*/
-static int stat_loose_object(struct repository *r, const struct object_id *oid,
+static int stat_loose_object(struct odb_source_loose *loose,
+ const struct object_id *oid,
struct stat *st, const char **path)
{
- struct odb_source *source;
static struct strbuf buf = STRBUF_INIT;
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- *path = odb_loose_path(source, &buf, oid);
- if (!lstat(*path, st))
- return 0;
- }
+ *path = odb_loose_path(loose->source, &buf, oid);
+ if (!lstat(*path, st))
+ return 0;
return -1;
}
@@ -194,39 +191,24 @@ static int stat_loose_object(struct repository *r, const struct object_id *oid,
* Like stat_loose_object(), but actually open the object and return the
* descriptor. See the caveats on the "path" parameter above.
*/
-static int open_loose_object(struct repository *r,
+static int open_loose_object(struct odb_source_loose *loose,
const struct object_id *oid, const char **path)
{
- int fd;
- struct odb_source *source;
- int most_interesting_errno = ENOENT;
static struct strbuf buf = STRBUF_INIT;
+ int fd;
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- *path = odb_loose_path(source, &buf, oid);
- fd = git_open(*path);
- if (fd >= 0)
- return fd;
+ *path = odb_loose_path(loose->source, &buf, oid);
+ fd = git_open(*path);
+ if (fd >= 0)
+ return fd;
- if (most_interesting_errno == ENOENT)
- most_interesting_errno = errno;
- }
- errno = most_interesting_errno;
return -1;
}
-static int quick_has_loose(struct repository *r,
+static int quick_has_loose(struct odb_source_loose *loose,
const struct object_id *oid)
{
- struct odb_source *source;
-
- odb_prepare_alternates(r->objects);
- for (source = r->objects->sources; source; source = source->next) {
- if (oidtree_contains(odb_source_loose_cache(source, oid), oid))
- return 1;
- }
- return 0;
+ return !!oidtree_contains(odb_source_loose_cache(loose->source, oid), oid);
}
/*
@@ -252,12 +234,12 @@ static void *map_fd(int fd, const char *path, unsigned long *size)
return map;
}
-void *map_loose_object(struct repository *r,
- const struct object_id *oid,
- unsigned long *size)
+void *odb_source_loose_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size)
{
const char *p;
- int fd = open_loose_object(r, oid, &p);
+ int fd = open_loose_object(source->loose, oid, &p);
if (fd < 0)
return NULL;
@@ -407,9 +389,9 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
return 0;
}
-int loose_object_info(struct repository *r,
- const struct object_id *oid,
- struct object_info *oi, int flags)
+int odb_source_loose_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags)
{
int status = 0;
int fd;
@@ -422,7 +404,7 @@ int loose_object_info(struct repository *r,
enum object_type type_scratch;
if (oi->delta_base_oid)
- oidclr(oi->delta_base_oid, r->hash_algo);
+ oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
/*
* If we don't care about type or size, then we don't
@@ -435,15 +417,15 @@ int loose_object_info(struct repository *r,
if (!oi->typep && !oi->sizep && !oi->contentp) {
struct stat st;
if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(r, oid) ? 0 : -1;
- if (stat_loose_object(r, oid, &st, &path) < 0)
+ return quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
if (oi->disk_sizep)
*oi->disk_sizep = st.st_size;
return 0;
}
- fd = open_loose_object(r, oid, &path);
+ fd = open_loose_object(source->loose, oid, &path);
if (fd < 0) {
if (errno != ENOENT)
error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
diff --git a/object-file.h b/object-file.h
index f8a96a45f57..ca13d3d64e7 100644
--- a/object-file.h
+++ b/object-file.h
@@ -43,6 +43,14 @@ void odb_source_loose_free(struct odb_source_loose *loose);
/* Reprepare the loose source by emptying the loose object cache. */
void odb_source_loose_reprepare(struct odb_source *source);
+int odb_source_loose_read_object_info(struct odb_source *source,
+ const struct object_id *oid,
+ struct object_info *oi, int flags);
+
+void *odb_source_loose_map_object(struct odb_source *source,
+ const struct object_id *oid,
+ unsigned long *size);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -66,9 +74,6 @@ const char *odb_loose_path(struct odb_source *source,
int has_loose_object(struct odb_source *source,
const struct object_id *oid);
-void *map_loose_object(struct repository *r, const struct object_id *oid,
- unsigned long *size);
-
/*
* Iterate over the files in the loose-object parts of the object
* directory "path", triggering the following callbacks:
@@ -196,10 +201,6 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
*/
int stream_object_signature(struct repository *r, const struct object_id *oid);
-int loose_object_info(struct repository *r,
- const struct object_id *oid,
- struct object_info *oi, int flags);
-
enum finalize_object_file_flags {
FOF_SKIP_COLLISION_CHECK = 1,
};
diff --git a/odb.c b/odb.c
index d1df9609e21..4c0b4fdcd54 100644
--- a/odb.c
+++ b/odb.c
@@ -697,13 +697,18 @@ static int do_oid_object_info_extended(struct object_database *odb,
return 0;
}
+ odb_prepare_alternates(odb);
+
while (1) {
+ struct odb_source *source;
+
if (find_pack_entry(odb->repo, real, &e))
break;
/* Most likely it's a loose object. */
- if (!loose_object_info(odb->repo, real, oi, flags))
- return 0;
+ for (source = odb->sources; source; source = source->next)
+ if (!odb_source_loose_read_object_info(source, real, oi, flags))
+ return 0;
/* Not a loose object; someone else may have just packed it. */
if (!(flags & OBJECT_INFO_QUICK)) {
diff --git a/streaming.c b/streaming.c
index 4b13827668e..00ad649ae39 100644
--- a/streaming.c
+++ b/streaming.c
@@ -230,12 +230,21 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
enum object_type *type)
{
struct object_info oi = OBJECT_INFO_INIT;
+ struct odb_source *source;
+
oi.sizep = &st->size;
oi.typep = type;
- st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
+ odb_prepare_alternates(r->objects);
+ for (source = r->objects->sources; source; source = source->next) {
+ st->u.loose.mapped = odb_source_loose_map_object(source, oid,
+ &st->u.loose.mapsize);
+ if (st->u.loose.mapped)
+ break;
+ }
if (!st->u.loose.mapped)
return -1;
+
switch (unpack_loose_header(&st->z, st->u.loose.mapped,
st->u.loose.mapsize, st->u.loose.hdr,
sizeof(st->u.loose.hdr))) {
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 10/13] object-file: rename `has_loose_object()`
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (8 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 09/13] object-file: read objects via the loose object source Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 11/13] object-file: refactor freshening of objects Patrick Steinhardt
` (3 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Rename `has_loose_object()` to `odb_source_loose_has_object()` so that
it becomes clear that this is tied to a specific loose object source.
This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/pack-objects.c | 4 ++--
object-file.c | 6 +++---
object-file.h | 16 ++++++++--------
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5bdc44fb2de..2a448ab3585 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1716,7 +1716,7 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
*/
struct odb_source *source = the_repository->objects->sources->next;
for (; source; source = source->next)
- if (has_loose_object(source, oid))
+ if (odb_source_loose_has_object(source, oid))
return 0;
}
@@ -3980,7 +3980,7 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type
int found = 0;
for (; !found && source; source = source->next)
- if (has_loose_object(source, oid))
+ if (odb_source_loose_has_object(source, oid))
found = 1;
/*
diff --git a/object-file.c b/object-file.c
index 6d6e9a5a2ad..79e7ab8d2e3 100644
--- a/object-file.c
+++ b/object-file.c
@@ -99,8 +99,8 @@ static int check_and_freshen_source(struct odb_source *source,
return check_and_freshen_file(path.buf, freshen);
}
-int has_loose_object(struct odb_source *source,
- const struct object_id *oid)
+int odb_source_loose_has_object(struct odb_source *source,
+ const struct object_id *oid)
{
return check_and_freshen_source(source, oid, 0);
}
@@ -1161,7 +1161,7 @@ int force_object_loose(struct odb_source *source,
int ret;
for (struct odb_source *s = source->odb->sources; s; s = s->next)
- if (has_loose_object(s, oid))
+ if (odb_source_loose_has_object(s, oid))
return 0;
oi.typep = &type;
diff --git a/object-file.h b/object-file.h
index ca13d3d64e7..065a44bb8a0 100644
--- a/object-file.h
+++ b/object-file.h
@@ -51,6 +51,14 @@ void *odb_source_loose_map_object(struct odb_source *source,
const struct object_id *oid,
unsigned long *size);
+/*
+ * Return true iff an object database source has a loose object
+ * with the specified name. This function does not respect replace
+ * references.
+ */
+int odb_source_loose_has_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -66,14 +74,6 @@ const char *odb_loose_path(struct odb_source *source,
struct strbuf *buf,
const struct object_id *oid);
-/*
- * Return true iff an object database source has a loose object
- * with the specified name. This function does not respect replace
- * references.
- */
-int has_loose_object(struct odb_source *source,
- const struct object_id *oid);
-
/*
* Iterate over the files in the loose-object parts of the object
* directory "path", triggering the following callbacks:
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 11/13] object-file: refactor freshening of objects
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (9 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
` (2 subsequent siblings)
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
When writing an object that already exists in our object database we
skip the write and instead only update mtimes of the object, either in
its packed or loose object format. This logic is wholly contained in
"object-file.c", but that file is really only concerned with loose
objects. So it does not really make sense that it also contains the
logic to freshen a packed object.
Introduce a new `odb_freshen_object()` function that sits on the object
database level and two functions `packfile_store_freshen_object()` and
`odb_source_loose_freshen_object()`. Like this, the format-specific
functions can be part of their respective subsystems, while the backend
agnostic function to freshen an object sits at the object database
layer.
Note that this change also moves the logic that iterates through object
sources from the object source layer into the object database layer.
This change is intentional: object sources should ideally only have to
worry about themselves, and coordination of different sources should be
handled on the object database level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 33 +++++----------------------------
object-file.h | 3 +++
odb.c | 16 ++++++++++++++++
odb.h | 3 +++
packfile.c | 16 ++++++++++++++++
packfile.h | 3 +++
6 files changed, 46 insertions(+), 28 deletions(-)
diff --git a/object-file.c b/object-file.c
index 79e7ab8d2e3..893c32adcdd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -968,30 +968,10 @@ static int write_loose_object(struct odb_source *source,
FOF_SKIP_COLLISION_CHECK);
}
-static int freshen_loose_object(struct object_database *odb,
- const struct object_id *oid)
+int odb_source_loose_freshen_object(struct odb_source *source,
+ const struct object_id *oid)
{
- odb_prepare_alternates(odb);
- for (struct odb_source *source = odb->sources; source; source = source->next)
- if (check_and_freshen_source(source, oid, 1))
- return 1;
- return 0;
-}
-
-static int freshen_packed_object(struct object_database *odb,
- const struct object_id *oid)
-{
- struct pack_entry e;
- if (!find_pack_entry(odb->repo, oid, &e))
- return 0;
- if (e.p->is_cruft)
- return 0;
- if (e.p->freshened)
- return 1;
- if (!freshen_file(e.p->pack_name))
- return 0;
- e.p->freshened = 1;
- return 1;
+ return !!check_and_freshen_source(source, oid, 1);
}
int stream_loose_object(struct odb_source *source,
@@ -1073,12 +1053,10 @@ int stream_loose_object(struct odb_source *source,
die(_("deflateEnd on stream object failed (%d)"), ret);
close_loose_object(source, fd, tmp_file.buf);
- if (freshen_packed_object(source->odb, oid) ||
- freshen_loose_object(source->odb, oid)) {
+ if (odb_freshen_object(source->odb, oid)) {
unlink_or_warn(tmp_file.buf);
goto cleanup;
}
-
odb_loose_path(source, &filename, oid);
/* We finally know the object path, and create the missing dir. */
@@ -1137,8 +1115,7 @@ int write_object_file(struct odb_source *source,
* it out into .git/objects/??/?{38} file.
*/
write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
- if (freshen_packed_object(source->odb, oid) ||
- freshen_loose_object(source->odb, oid))
+ if (odb_freshen_object(source->odb, oid))
return 0;
if (write_loose_object(source, oid, hdr, hdrlen, buf, len, 0, flags))
return -1;
diff --git a/object-file.h b/object-file.h
index 065a44bb8a0..ee5b24cec66 100644
--- a/object-file.h
+++ b/object-file.h
@@ -59,6 +59,9 @@ void *odb_source_loose_map_object(struct odb_source *source,
int odb_source_loose_has_object(struct odb_source *source,
const struct object_id *oid);
+int odb_source_loose_freshen_object(struct odb_source *source,
+ const struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
diff --git a/odb.c b/odb.c
index 4c0b4fdcd54..17734bdaffe 100644
--- a/odb.c
+++ b/odb.c
@@ -987,6 +987,22 @@ int odb_has_object(struct object_database *odb, const struct object_id *oid,
return odb_read_object_info_extended(odb, oid, NULL, object_info_flags) >= 0;
}
+int odb_freshen_object(struct object_database *odb,
+ const struct object_id *oid)
+{
+ struct odb_source *source;
+
+ if (packfile_store_freshen_object(odb->packfiles, oid))
+ return 1;
+
+ odb_prepare_alternates(odb);
+ for (source = odb->sources; source; source = source->next)
+ if (odb_source_loose_freshen_object(source, oid))
+ return 1;
+
+ return 0;
+}
+
void odb_assert_oid_type(struct object_database *odb,
const struct object_id *oid, enum object_type expect)
{
diff --git a/odb.h b/odb.h
index f9a3137a34a..2653247e0cc 100644
--- a/odb.h
+++ b/odb.h
@@ -396,6 +396,9 @@ int odb_has_object(struct object_database *odb,
const struct object_id *oid,
unsigned flags);
+int odb_freshen_object(struct object_database *odb,
+ const struct object_id *oid);
+
void odb_assert_oid_type(struct object_database *odb,
const struct object_id *oid, enum object_type expect);
diff --git a/packfile.c b/packfile.c
index 5a7caec2925..2ab49a0beb1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -819,6 +819,22 @@ struct packed_git *packfile_store_load_pack(struct packfile_store *store,
return p;
}
+int packfile_store_freshen_object(struct packfile_store *store,
+ const struct object_id *oid)
+{
+ struct pack_entry e;
+ if (!find_pack_entry(store->odb->repo, oid, &e))
+ return 0;
+ if (e.p->is_cruft)
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ if (utime(e.p->pack_name, NULL))
+ return 0;
+ e.p->freshened = 1;
+ return 1;
+}
+
void (*report_garbage)(unsigned seen_bits, const char *path);
static void report_helper(const struct string_list *list,
diff --git a/packfile.h b/packfile.h
index e7a5792b6cf..0ad080046f1 100644
--- a/packfile.h
+++ b/packfile.h
@@ -161,6 +161,9 @@ struct list_head *packfile_store_get_packs_mru(struct packfile_store *store);
struct packed_git *packfile_store_load_pack(struct packfile_store *store,
const char *idx_path, int local);
+int packfile_store_freshen_object(struct packfile_store *store,
+ const struct object_id *oid);
+
struct pack_window {
struct pack_window *next;
unsigned char *base;
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 12/13] object-file: rename `write_object_file()`
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (10 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 11/13] object-file: refactor freshening of objects Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
2025-11-03 13:39 ` [PATCH v3 00/13] Carve out loose object source Karthik Nayak
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
Rename `write_object_file()` to `odb_source_loose_write_object()` so
that it becomes clear that this is tied to a specific loose object
source. This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 8 ++++----
object-file.h | 10 +++++-----
odb.c | 3 ++-
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/object-file.c b/object-file.c
index 893c32adcd..fdc644a427 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,10 +1084,10 @@ int stream_loose_object(struct odb_source *source,
return err;
}
-int write_object_file(struct odb_source *source,
- const void *buf, unsigned long len,
- enum object_type type, struct object_id *oid,
- struct object_id *compat_oid_in, unsigned flags)
+int odb_source_loose_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags)
{
const struct git_hash_algo *algo = source->odb->repo->hash_algo;
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
diff --git a/object-file.h b/object-file.h
index ee5b24cec6..36a60e15c4 100644
--- a/object-file.h
+++ b/object-file.h
@@ -62,6 +62,11 @@ int odb_source_loose_has_object(struct odb_source *source,
int odb_source_loose_freshen_object(struct odb_source *source,
const struct object_id *oid);
+int odb_source_loose_write_object(struct odb_source *source,
+ const void *buf, unsigned long len,
+ enum object_type type, struct object_id *oid,
+ struct object_id *compat_oid_in, unsigned flags);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -168,11 +173,6 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
-int write_object_file(struct odb_source *source,
- const void *buf, unsigned long len,
- enum object_type type, struct object_id *oid,
- struct object_id *compat_oid_in, unsigned flags);
-
struct input_stream {
const void *(*read)(struct input_stream *, unsigned long *len);
void *data;
diff --git a/odb.c b/odb.c
index 17734bdaff..da44f1d63b 100644
--- a/odb.c
+++ b/odb.c
@@ -1021,7 +1021,8 @@ int odb_write_object_ext(struct object_database *odb,
struct object_id *compat_oid,
unsigned flags)
{
- return write_object_file(odb->sources, buf, len, type, oid, compat_oid, flags);
+ return odb_source_loose_write_object(odb->sources, buf, len, type,
+ oid, compat_oid, flags);
}
struct object_database *odb_new(struct repository *repo)
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 13/13] object-file: refactor writing objects via a stream
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (11 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
@ 2025-11-03 7:42 ` Patrick Steinhardt
2025-11-03 13:39 ` [PATCH v3 00/13] Carve out loose object source Karthik Nayak
13 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-11-03 7:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Karthik Nayak
We have two different ways to write an object into the database:
- We either provide the full buffer and write the object all at once.
- Or we provide an input stream that has a `read()` function so that
we can chunk the object.
The latter is especially used for large objects, where it may be too
expensive to hold the complete object in memory all at once.
While we already have `odb_write_object()` at the ODB-layer, we don't
have an equivalent for streaming an object. Introduce a new function
`odb_write_object_stream()` to address this gap so that callers don't
have to be aware of the inner workings of how to stream an object to
disk with a specific object source.
Rename `stream_loose_object()` to `odb_source_loose_write_stream()` to
clarify its scope. This matches our modern best practices around how to
name functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/unpack-objects.c | 7 +++----
object-file.c | 6 +++---
object-file.h | 14 ++++----------
odb.c | 7 +++++++
odb.h | 10 ++++++++++
5 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index ef79e43715d..6fc64e9e4b8 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -363,7 +363,7 @@ struct input_zstream_data {
int status;
};
-static const void *feed_input_zstream(struct input_stream *in_stream,
+static const void *feed_input_zstream(struct odb_write_stream *in_stream,
unsigned long *readlen)
{
struct input_zstream_data *data = in_stream->data;
@@ -393,7 +393,7 @@ static void stream_blob(unsigned long size, unsigned nr)
{
git_zstream zstream = { 0 };
struct input_zstream_data data = { 0 };
- struct input_stream in_stream = {
+ struct odb_write_stream in_stream = {
.read = feed_input_zstream,
.data = &data,
};
@@ -402,8 +402,7 @@ static void stream_blob(unsigned long size, unsigned nr)
data.zstream = &zstream;
git_inflate_init(&zstream);
- if (stream_loose_object(the_repository->objects->sources,
- &in_stream, size, &info->oid))
+ if (odb_write_object_stream(the_repository->objects, &in_stream, size, &info->oid))
die(_("failed to write object in stream"));
if (data.status != Z_STREAM_END)
diff --git a/object-file.c b/object-file.c
index fdc644a4275..811c569ed36 100644
--- a/object-file.c
+++ b/object-file.c
@@ -974,9 +974,9 @@ int odb_source_loose_freshen_object(struct odb_source *source,
return !!check_and_freshen_source(source, oid, 1);
}
-int stream_loose_object(struct odb_source *source,
- struct input_stream *in_stream, size_t len,
- struct object_id *oid)
+int odb_source_loose_write_stream(struct odb_source *source,
+ struct odb_write_stream *in_stream, size_t len,
+ struct object_id *oid)
{
const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
struct object_id compat_oid;
diff --git a/object-file.h b/object-file.h
index 36a60e15c40..eeffa67bbda 100644
--- a/object-file.h
+++ b/object-file.h
@@ -67,6 +67,10 @@ int odb_source_loose_write_object(struct odb_source *source,
enum object_type type, struct object_id *oid,
struct object_id *compat_oid_in, unsigned flags);
+int odb_source_loose_write_stream(struct odb_source *source,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid);
+
/*
* Populate and return the loose object cache array corresponding to the
* given object ID.
@@ -173,16 +177,6 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
-struct input_stream {
- const void *(*read)(struct input_stream *, unsigned long *len);
- void *data;
- int is_finished;
-};
-
-int stream_loose_object(struct odb_source *source,
- struct input_stream *in_stream, size_t len,
- struct object_id *oid);
-
int force_object_loose(struct odb_source *source,
const struct object_id *oid, time_t mtime);
diff --git a/odb.c b/odb.c
index da44f1d63b4..3ec21ef24e1 100644
--- a/odb.c
+++ b/odb.c
@@ -1025,6 +1025,13 @@ int odb_write_object_ext(struct object_database *odb,
oid, compat_oid, flags);
}
+int odb_write_object_stream(struct object_database *odb,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid)
+{
+ return odb_source_loose_write_stream(odb->sources, stream, len, oid);
+}
+
struct object_database *odb_new(struct repository *repo)
{
struct object_database *o = xmalloc(sizeof(*o));
diff --git a/odb.h b/odb.h
index 2653247e0cc..9bb28008b1d 100644
--- a/odb.h
+++ b/odb.h
@@ -492,4 +492,14 @@ static inline int odb_write_object(struct object_database *odb,
return odb_write_object_ext(odb, buf, len, type, oid, NULL, 0);
}
+struct odb_write_stream {
+ const void *(*read)(struct odb_write_stream *, unsigned long *len);
+ void *data;
+ int is_finished;
+};
+
+int odb_write_object_stream(struct object_database *odb,
+ struct odb_write_stream *stream, size_t len,
+ struct object_id *oid);
+
#endif /* ODB_H */
--
2.51.2.1041.gc1ab5b90ca.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v3 00/13] Carve out loose object source
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
` (12 preceding siblings ...)
2025-11-03 7:42 ` [PATCH v3 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
@ 2025-11-03 13:39 ` Karthik Nayak
13 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2025-11-03 13:39 UTC (permalink / raw)
To: Patrick Steinhardt, git; +Cc: Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 1652 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
[snip]
> Range-diff versus v2:
>
> 1: 330b7c17e6a = 1: ed548a7ee4e odb: fix subtle logic to check whether an alternate is usable
> 2: 68b2e736d20 = 2: 59d5548ddfa odb: introduce `odb_source_new()`
> 3: b6f9a3f6d26 = 3: a417bd30153 odb: adjust naming to free object sources
> 4: fe38c58ab20 = 4: add2b7d112f object-file: move `fetch_if_missing`
> 5: 5fed6e7c429 ! 5: 5245ba0c6a5 object-file: introduce `struct odb_loose_source`
> @@ Metadata
> Author: Patrick Steinhardt <ps@pks.im>
>
> ## Commit message ##
> - object-file: introduce `struct odb_loose_source`
> + object-file: introduce `struct odb_source_loose`
>
> Currently, all state that relates to loose objects is held directly by
> - the `struct odb_source`. Introduce a new `struct odb_loose_source` to
> + the `struct odb_source`. Introduce a new `struct odb_source_loose` to
> hold the state instead so that it is entirely self-contained.
>
> This structure will eventually morph into the backend for accessing
> @@ object-file.c: void object_file_transaction_commit(struct odb_transaction *trans
> free(transaction);
> }
> +
> -+struct odb_loose_source *odb_loose_source_new(struct odb_source *source)
> ++struct odb_source_loose *odb_source_loose_new(struct odb_source *source)
> +{
> -+ struct odb_loose_source *loose;
> ++ struct odb_source_loose *loose;
This is one of the points that Junio mentioned and I think the new
naming schemes flows more naturally into. Nice.
The range-diff looks good to me. Thanks.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2025-11-03 13:39 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-24 9:55 [PATCH 00/13] Carve out loose object source Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
2025-10-24 16:21 ` Junio C Hamano
2025-10-30 10:34 ` Karthik Nayak
2025-10-24 9:56 ` [PATCH 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
2025-10-24 16:37 ` Junio C Hamano
2025-10-27 11:21 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 03/13] odb: adjust naming to free object sources Patrick Steinhardt
2025-10-30 10:41 ` Karthik Nayak
2025-10-24 9:56 ` [PATCH 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 05/13] object-file: introduce `struct odb_loose_source` Patrick Steinhardt
2025-10-30 10:47 ` Karthik Nayak
2025-10-30 11:32 ` Patrick Steinhardt
2025-10-31 6:11 ` Patrick Steinhardt
2025-10-31 16:16 ` Junio C Hamano
2025-11-03 7:19 ` Patrick Steinhardt
2025-10-31 16:10 ` Junio C Hamano
2025-10-24 9:56 ` [PATCH 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
2025-10-24 21:44 ` Junio C Hamano
2025-10-27 11:21 ` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 08/13] object-file: move loose object map into loose source Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 09/13] object-file: read objects via the loose object source Patrick Steinhardt
2025-10-30 12:19 ` Karthik Nayak
2025-10-24 9:56 ` [PATCH 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 11/13] object-file: refactor freshening of objects Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
2025-10-24 9:56 ` [PATCH 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
2025-10-30 12:24 ` [PATCH 00/13] Carve out loose object source Karthik Nayak
2025-10-31 6:12 ` [PATCH v2 " Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 03/13] odb: adjust naming to free object sources Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 05/13] object-file: introduce `struct odb_loose_source` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 08/13] object-file: move loose object map into loose source Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 09/13] object-file: read objects via the loose object source Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 11/13] object-file: refactor freshening of objects Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
2025-10-31 6:12 ` [PATCH v2 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 00/13] Carve out loose object source Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 01/13] odb: fix subtle logic to check whether an alternate is usable Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 02/13] odb: introduce `odb_source_new()` Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 03/13] odb: adjust naming to free object sources Patrick Steinhardt
2025-11-03 7:41 ` [PATCH v3 04/13] object-file: move `fetch_if_missing` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 05/13] object-file: introduce `struct odb_source_loose` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 06/13] object-file: move loose object cache into loose source Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 07/13] object-file: hide internals when we need to reprepare loose sources Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 08/13] object-file: move loose object map into loose source Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 09/13] object-file: read objects via the loose object source Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 10/13] object-file: rename `has_loose_object()` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 11/13] object-file: refactor freshening of objects Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 12/13] object-file: rename `write_object_file()` Patrick Steinhardt
2025-11-03 7:42 ` [PATCH v3 13/13] object-file: refactor writing objects via a stream Patrick Steinhardt
2025-11-03 13:39 ` [PATCH v3 00/13] Carve out loose object source Karthik Nayak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).