From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
Eric Sunshine <sunshine@sunshineco.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v3 00/10] Split up "object-file.c"
Date: Tue, 15 Apr 2025 11:38:13 +0200 [thread overview]
Message-ID: <20250415-pks-split-object-file-v3-0-6aa7db7ad7b0@pks.im> (raw)
In-Reply-To: <20250408-pks-split-object-file-v1-0-f1fd50191143@pks.im>
Hi,
"object-file.c" is quite a grab-bag of all kinds of different functions.
Many of these functions aren't really a good fit though and should be
owned by a different subsystem. This patch series tries to split up
concerns a bit better by splitting out this functionality into other
files:
- `safe_create_leading_directories()` is moved into "dir.c".
- `xmmap()` is moved into "wrapper.c".
- `git_open_cloexec()` is moved into "compat/open.c".
- Several functions attached to `struct index_state` are moved into
"read-cache.c".
- Several functions related to `struct object_store` are moved into a
new file "object-store.c".
"object-file.c" now mostly contains logic to read and write loose object
files, whereas "object-store.c" contains the higher-level logic to
manage different object directories for a repository. Eventually, these
will become the loose object backend as well as the `struct ref_store`
equivalent for objects, respectively.
The series is built on top of 9d22ac51228 (The third batch, 2025-04-07)
with ps/object-wo-the-repository at 9442b1c919a (Merge remote-tracking
branch 'junio/ps/object-wo-the-repository' into HEAD, 2025-04-08) merged
into it.
Changes in v2:
- Fix a grammar issue in one of the commit messages.
- Link to v1: https://lore.kernel.org/r/20250408-pks-split-object-file-v1-0-f1fd50191143@pks.im
Changes in v3:
- Rename `mkdir_in_gitdir()` to `safe_create_dir_in_gitdir()` to match
naming of similar functions.
- Move `safe_create_leading_directories()` et al into "path.c" instead
of into "dir.c". This also requires us to start injecting a repo via
parameters as "path.c" doesn't have `the_repository` available
anymore.
- Drop the commit that moves `index_blob_stream()` and related
functions.
- Expand the reasoning why we want to have cached objects per object
store instead of globally.
- Drop `index_blob_stream()`, which is a trivial wrapper around
`index_blob_bulk_checkin()`.
- Link to v2: https://lore.kernel.org/r/20250411-pks-split-object-file-v2-0-2bea0c9033ae@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (10):
object-file: move `mkdir_in_gitdir()` into "path.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `xmmap()` into "wrapper.c"
object-file: split out functions relating to object store subsystem
object-file: split up concerns of `HASH_*` flags
object-file: drop `index_blob_stream()`
object: split out functions relating to object store subsystem
object-store: remove global array of cached objects
object-store: merge "object-store-ll.h" and "object-store.h"
Makefile | 3 +-
apply.c | 2 +-
archive-tar.c | 2 +-
archive-zip.c | 2 +-
archive.c | 2 +-
attr.c | 2 +-
bisect.c | 2 +-
blame.c | 4 +-
builtin/backfill.c | 2 +-
builtin/blame.c | 2 +-
builtin/bugreport.c | 4 +-
builtin/cat-file.c | 2 +-
builtin/checkout.c | 3 +-
builtin/clone.c | 6 +-
builtin/commit-graph.c | 2 +-
builtin/commit-tree.c | 2 +-
builtin/count-objects.c | 2 +-
builtin/credential-cache--daemon.c | 4 +-
builtin/describe.c | 2 +-
builtin/diagnose.c | 4 +-
builtin/difftool.c | 31 +-
builtin/fast-export.c | 2 +-
builtin/fast-import.c | 4 +-
builtin/fetch.c | 2 +-
builtin/fsck.c | 4 +-
builtin/gc.c | 9 +-
builtin/grep.c | 2 +-
builtin/hash-object.c | 25 +-
builtin/index-pack.c | 2 +-
builtin/init-db.c | 3 +-
builtin/log.c | 6 +-
builtin/ls-tree.c | 2 +-
builtin/merge-file.c | 1 +
builtin/merge-tree.c | 2 +-
builtin/mktag.c | 2 +-
builtin/mktree.c | 3 +-
builtin/multi-pack-index.c | 2 +-
builtin/mv.c | 3 +-
builtin/notes.c | 3 +-
builtin/pack-objects.c | 2 +-
builtin/pack-redundant.c | 2 +-
builtin/prune.c | 2 +-
builtin/rebase.c | 3 +-
builtin/receive-pack.c | 3 +-
builtin/remote.c | 2 +-
builtin/repack.c | 2 +-
builtin/replace.c | 4 +-
builtin/rev-list.c | 2 +-
builtin/show-ref.c | 2 +-
builtin/sparse-checkout.c | 5 +-
builtin/submodule--helper.c | 6 +-
builtin/tag.c | 3 +-
builtin/unpack-file.c | 3 +-
builtin/unpack-objects.c | 3 +-
builtin/update-index.c | 2 +-
builtin/worktree.c | 8 +-
bulk-checkin.c | 8 +-
bulk-checkin.h | 15 +
bundle-uri.c | 2 +-
bundle.c | 2 +-
cache-tree.c | 4 +-
combine-diff.c | 2 +-
commit-graph.c | 5 +-
commit-graph.h | 2 +-
commit.c | 3 +-
compat/open.c | 29 +
config.c | 2 +-
connected.c | 2 +-
convert.c | 2 +-
diagnose.c | 2 +-
diff.c | 2 +-
diffcore-rename.c | 2 +-
dir.c | 7 +-
entry.c | 2 +-
fetch-pack.c | 2 +-
fmt-merge-msg.c | 2 +-
fsck.c | 2 +-
git-compat-util.h | 3 +
grep.c | 2 +-
http-backend.c | 2 +-
http-push.c | 3 +-
http-walker.c | 2 +-
http.c | 2 +-
list-objects-filter.c | 2 +-
list-objects.c | 2 +-
log-tree.c | 2 +-
mailmap.c | 2 +-
match-trees.c | 3 +-
merge-blobs.c | 2 +-
merge-ort.c | 3 +-
merge-recursive.c | 6 +-
meson.build | 2 +
midx-write.c | 2 +-
midx.c | 1 -
notes-cache.c | 3 +-
notes-merge.c | 8 +-
notes.c | 3 +-
object-file.c | 1220 +-----------------------------------
object-file.h | 121 ++--
object-name.c | 2 +-
object-store-ll.h | 556 ----------------
object-store.c | 1050 +++++++++++++++++++++++++++++++
object-store.h | 516 ++++++++++++++-
object.c | 67 --
oss-fuzz/fuzz-pack-idx.c | 2 +-
pack-bitmap-write.c | 2 +-
pack-bitmap.c | 3 +-
pack-check.c | 2 +-
pack-mtimes.c | 3 +-
pack-objects.h | 2 +-
pack-revindex.c | 3 +-
packfile.c | 2 +-
path.c | 111 +++-
path.h | 45 ++
promisor-remote.c | 2 +-
protocol-caps.c | 2 +-
prune-packed.c | 2 +-
reachable.c | 2 +-
read-cache.c | 6 +-
ref-filter.c | 2 +-
reflog.c | 2 +-
refs.c | 2 +-
refs/files-backend.c | 4 +-
remote.c | 2 +-
replace-object.c | 2 +-
replace-object.h | 2 +-
repository.c | 2 +-
rerere.c | 7 +-
revision.c | 2 +-
send-pack.c | 2 +-
sequencer.c | 6 +-
server-info.c | 4 +-
shallow.c | 2 +-
streaming.c | 2 +-
submodule-config.c | 2 +-
submodule.c | 4 +-
t/helper/test-pack-mtimes.c | 2 +-
t/helper/test-partial-clone.c | 2 +-
t/helper/test-read-graph.c | 2 +-
t/helper/test-read-midx.c | 2 +-
t/helper/test-ref-store.c | 2 +-
tag.c | 2 +-
tmp-objdir.c | 2 +-
tree-walk.c | 2 +-
tree.c | 2 +-
unpack-trees.c | 2 +-
upload-pack.c | 2 +-
walker.c | 2 +-
wrapper.c | 48 ++
xdiff-interface.c | 2 +-
150 files changed, 2147 insertions(+), 2067 deletions(-)
Range-diff versus v2:
1: ed337338970 < -: ----------- object-file: move `safe_create_leading_directories()` into "dir.c"
-: ----------- > 1: 8d838f92936 object-file: move `mkdir_in_gitdir()` into "path.c"
-: ----------- > 2: c7c723db86c object-file: move `safe_create_leading_directories()` into "path.c"
2: 48730ada01e = 3: 2b4db7b2090 object-file: move `git_open_cloexec()` to "compat/open.c"
3: a9436d9e4a6 = 4: 7e8b750d652 object-file: move `xmmap()` into "wrapper.c"
4: 628d0bcce6c ! 5: 3b90e8a5841 object-file: split out functions relating to object store subsystem
@@ object-file.c: static int get_conv_flags(unsigned flags)
return 0;
}
--
-int odb_mkstemp(struct strbuf *temp_filename, const char *pattern)
-{
- int fd;
@@ object-file.c: static int get_conv_flags(unsigned flags)
- /* slow path */
- /* some mkstemp implementations erase temp_filename on failure */
- repo_git_path_replace(the_repository, temp_filename, "objects/%s", pattern);
-- safe_create_leading_directories(temp_filename->buf);
+- safe_create_leading_directories(the_repository, temp_filename->buf);
- return xmkstemp_mode(temp_filename->buf, mode);
-}
-
@@ object-file.c: static int get_conv_flags(unsigned flags)
- return fd;
-
- /* slow path */
-- safe_create_leading_directories_const(name);
+- safe_create_leading_directories_const(the_repository, name);
- return open(name, O_RDWR|O_CREAT|O_EXCL, 0600);
-}
-
@@ object-store.c (new)
+ /* slow path */
+ /* some mkstemp implementations erase temp_filename on failure */
+ repo_git_path_replace(the_repository, temp_filename, "objects/%s", pattern);
-+ safe_create_leading_directories(temp_filename->buf);
++ safe_create_leading_directories(the_repository, temp_filename->buf);
+ return xmkstemp_mode(temp_filename->buf, mode);
+}
+
@@ object-store.c (new)
+ return fd;
+
+ /* slow path */
-+ safe_create_leading_directories_const(name);
++ safe_create_leading_directories_const(the_repository, name);
+ return open(name, O_RDWR|O_CREAT|O_EXCL, 0600);
+}
+
5: fd102d9e50a = 6: 6bc5d4811d3 object-file: split up concerns of `HASH_*` flags
6: 298a867897d < -: ----------- object-file: split out functions relating to index subsystem
-: ----------- > 7: c1e323f1673 object-file: drop `index_blob_stream()`
7: 53e2bd9d117 = 8: 6b46541fe2a object: split out functions relating to object store subsystem
8: 00b4a52b8ce ! 9: 59df5f22b9d object-store: remove global array of cached objects
@@ Commit message
object-store: remove global array of cached objects
Cached objects are virtual objects that can be set up without writing
- anything into the object store directly. This mechanism for example
- allows us to create fake commits in git-blame(1).
+ anything into the object store directly, which is used by git-blame(1)
+ to create fake commits for the working tree.
- The cached objects are stored in a global variable. Refactor the code so
- that we instead store the array as part of the raw object store. This is
- another step into the direction of libifying our object database.
+ These cached objects are stored in a global variable, which is another
+ roadblock for libification of the object subsystem. Refactor the code so
+ that we instead store the array as part of the raw object store.
+
+ This refactoring raises the question whether virtual objects should
+ really be specific to a single repository (or rather a single object
+ store). Hypothetical usecases might for example span across submodules,
+ and here it may or may not be the right thing to provide virtual objects
+ across submodule boundaries.
+
+ The only existing usecase is git-blame(1) though, which does not know to
+ blame across submodule boundaries in the first place. As such, storing
+ these objects both globally and per-repository would achieve the same
+ result right now. But arguably, if we learned to blame across submodule
+ boundaries, we would likely want to create separate fare working tree
+ commits for each of the submodules so that the user can learn which
+ worktree a specific uncommitted change belongs to. And even if we would
+ want to create the same fake commit for each of the submodules we could
+ do that when storing separate virtual objects per object store.
+
+ While this is all rather hypothetical, the takeaway is that handling
+ virtual objects per-object store gives us more flexibility compared to
+ storing them globally. In a hypothetical future where we have achieved
+ full libification one might be able to handle unrelated repositories in
+ a single process, where the state of one repository should not have an
+ impact on the state of another repository. As such, storing these cached
+ objects per object store will enable more usecases and should lead to
+ less surprising outcomes overall.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
9: 6365dc7cd16 ! 10: df6a749245f object-store: merge "object-store-ll.h" and "object-store.h"
@@ builtin/replace.c
#include "object-name.h"
-#include "object-store-ll.h"
+#include "object-store.h"
- #include "read-cache.h"
#include "replace-object.h"
#include "tag.h"
+ #include "wildmatch.h"
## builtin/rev-list.c ##
@@
@@ notes-merge.c
-#include "object-store-ll.h"
+#include "object-store.h"
#include "path.h"
- #include "read-cache.h"
#include "repository.h"
+ #include "diff.h"
## object-file.c ##
@@
-
#include "git-compat-util.h"
#include "bulk-checkin.h"
+ #include "convert.h"
+#include "dir.h"
#include "environment.h"
+ #include "fsck.h"
#include "gettext.h"
- #include "hex.h"
## object-name.c ##
@@
---
base-commit: 9442b1c919af9aed513eb0a484fe96358a500cf5
change-id: 20250408-pks-split-object-file-c61d7cd2a21f
next prev parent reply other threads:[~2025-04-15 9:38 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-08 10:24 [PATCH 0/9] Split up "object-file.c" Patrick Steinhardt
2025-04-08 10:24 ` [PATCH 1/9] object-file: move `safe_create_leading_directories()` into "dir.c" Patrick Steinhardt
2025-04-09 14:36 ` Elijah Newren
2025-04-11 9:27 ` Patrick Steinhardt
2025-04-11 17:11 ` Elijah Newren
2025-04-15 9:19 ` Patrick Steinhardt
2025-04-15 15:11 ` Junio C Hamano
2025-04-08 10:24 ` [PATCH 2/9] object-file: move `git_open_cloexec()` to "compat/open.c" Patrick Steinhardt
2025-04-08 10:24 ` [PATCH 3/9] object-file: move `xmmap()` into "wrapper.c" Patrick Steinhardt
2025-04-09 14:36 ` Elijah Newren
2025-04-08 10:24 ` [PATCH 4/9] object-file: split out functions relating to object store subsystem Patrick Steinhardt
2025-04-08 10:24 ` [PATCH 5/9] object-file: split up concerns of `HASH_*` flags Patrick Steinhardt
2025-04-08 10:24 ` [PATCH 6/9] object-file: split out functions relating to index subsystem Patrick Steinhardt
2025-04-08 10:24 ` [PATCH 7/9] object: split out functions relating to object store subsystem Patrick Steinhardt
2025-04-08 10:24 ` [PATCH 8/9] object-store: remove global array of cached objects Patrick Steinhardt
2025-04-08 10:24 ` [PATCH 9/9] object-store: merge "object-store-ll.h" and "object-store.h" Patrick Steinhardt
2025-04-08 23:29 ` [PATCH 0/9] Split up "object-file.c" Junio C Hamano
2025-04-11 9:26 ` Patrick Steinhardt
2025-04-09 14:42 ` Elijah Newren
2025-04-11 9:27 ` Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 " Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 1/9] object-file: move `safe_create_leading_directories()` into "dir.c" Patrick Steinhardt
2025-04-11 20:09 ` Junio C Hamano
2025-04-11 21:29 ` Eric Sunshine
2025-04-15 9:19 ` Patrick Steinhardt
2025-04-15 15:09 ` Junio C Hamano
2025-04-11 9:29 ` [PATCH v2 2/9] object-file: move `git_open_cloexec()` to "compat/open.c" Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 3/9] object-file: move `xmmap()` into "wrapper.c" Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 4/9] object-file: split out functions relating to object store subsystem Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 5/9] object-file: split up concerns of `HASH_*` flags Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 6/9] object-file: split out functions relating to index subsystem Patrick Steinhardt
2025-04-12 8:17 ` Jeff King
2025-04-14 11:49 ` Junio C Hamano
2025-04-15 9:19 ` Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 7/9] object: split out functions relating to object store subsystem Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 8/9] object-store: remove global array of cached objects Patrick Steinhardt
2025-04-11 22:58 ` Junio C Hamano
2025-04-15 9:19 ` Patrick Steinhardt
2025-04-11 9:29 ` [PATCH v2 9/9] object-store: merge "object-store-ll.h" and "object-store.h" Patrick Steinhardt
2025-04-15 9:38 ` Patrick Steinhardt [this message]
2025-04-15 9:38 ` [PATCH v3 01/10] object-file: move `mkdir_in_gitdir()` into "path.c" Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 02/10] object-file: move `safe_create_leading_directories()` " Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 03/10] object-file: move `git_open_cloexec()` to "compat/open.c" Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 04/10] object-file: move `xmmap()` into "wrapper.c" Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 05/10] object-file: split out functions relating to object store subsystem Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 06/10] object-file: split up concerns of `HASH_*` flags Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 07/10] object-file: drop `index_blob_stream()` Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 08/10] object: split out functions relating to object store subsystem Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 09/10] object-store: remove global array of cached objects Patrick Steinhardt
2025-04-15 9:38 ` [PATCH v3 10/10] object-store: merge "object-store-ll.h" and "object-store.h" Patrick Steinhardt
2025-04-16 6:41 ` [PATCH v3 00/10] Split up "object-file.c" Elijah Newren
2025-04-16 7:44 ` Patrick Steinhardt
2025-04-16 16:21 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250415-pks-split-object-file-v3-0-6aa7db7ad7b0@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).