From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
"Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v4 0/7] merge-ort: implement support for packing objects together
Date: Thu, 19 Oct 2023 13:28:38 -0400 [thread overview]
Message-ID: <cover.1697736516.git.me@ttaylorr.com> (raw)
(Rebased onto the current tip of 'master', which is 813d9a9188 (The
nineteenth batch, 2023-10-18) at the time of writing).
This series implements support for a new merge-tree option,
`--write-pack`, which causes any newly-written objects to be packed
together instead of being stored individually as loose.
I intentionally broke this off from the existing thread, since I
accidentally rerolled mine and Jonathan Tan's Bloom v2 series into it,
causing some confusion.
This is a new round that is significantly simplified thanks to
another very helpful suggestion[1] from Junio. By factoring out a common
"deflate object to pack" that takes an abstract bulk_checkin_source as a
parameter, all of the earlier refactorings can be dropped since we
retain only a single caller instead of multiple.
This resulted in a rather satisfying range-diff (included below, as
usual), and a similarly satisfying inter-diff:
$ git diff --stat tb/ort-bulk-checkin.v3..
bulk-checkin.c | 203 ++++++++++++++++---------------------------------
1 file changed, 64 insertions(+), 139 deletions(-)
Beyond that, the changes since last time can be viewed in the range-diff
below. Thanks in advance for any review!
[1]: https://lore.kernel.org/git/xmqq34y7plj4.fsf@gitster.g/
Taylor Blau (7):
bulk-checkin: extract abstract `bulk_checkin_source`
bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
bulk-checkin: refactor deflate routine to accept a
`bulk_checkin_source`
bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
builtin/merge-tree.c: implement support for `--write-pack`
Documentation/git-merge-tree.txt | 4 +
builtin/merge-tree.c | 5 +
bulk-checkin.c | 161 ++++++++++++++++++++++++++-----
bulk-checkin.h | 8 ++
merge-ort.c | 42 ++++++--
merge-recursive.h | 1 +
t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++
7 files changed, 280 insertions(+), 34 deletions(-)
Range-diff against v3:
1: 2dffa45183 < -: ---------- bulk-checkin: factor out `format_object_header_hash()`
2: 7a10dc794a < -: ---------- bulk-checkin: factor out `prepare_checkpoint()`
3: 20c32d2178 < -: ---------- bulk-checkin: factor out `truncate_checkpoint()`
4: 893051d0b7 < -: ---------- bulk-checkin: factor out `finalize_checkpoint()`
5: da52ec8380 ! 1: 97bb6e9f59 bulk-checkin: extract abstract `bulk_checkin_source`
@@ bulk-checkin.c: static int stream_blob_to_pack(struct bulk_checkin_packfile *sta
if (*already_hashed_to < offset) {
size_t hsize = offset - *already_hashed_to;
@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
- git_hash_ctx ctx;
+ unsigned header_len;
struct hashfile_checkpoint checkpoint = {0};
struct pack_idx_entry *idx = NULL;
+ struct bulk_checkin_source source = {
@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *st
seekback = lseek(fd, 0, SEEK_CUR);
if (seekback == (off_t) -1)
@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
- while (1) {
- prepare_checkpoint(state, &checkpoint, idx, flags);
+ crc32_begin(state->f);
+ }
if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
- fd, size, path, flags))
+ &source, flags))
break;
- truncate_checkpoint(state, &checkpoint, idx);
+ /*
+ * Writing this object to the current pack will make
+@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
+ hashfile_truncate(state->f, &checkpoint);
+ state->offset = checkpoint.offset;
+ flush_bulk_checkin_packfile(state);
- if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
+ if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1)
return error("cannot seek back");
}
- finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid);
+ the_hash_algo->final_oid_fn(result_oid, &ctx);
7: 04ec74e357 ! 2: 9d633df339 bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
@@ bulk-checkin.c: static int stream_blob_to_pack(struct bulk_checkin_packfile *sta
s.avail_out = sizeof(obuf) - hdrlen;
@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
-
- while (1) {
- prepare_checkpoint(state, &checkpoint, idx, flags);
+ idx->offset = state->offset;
+ crc32_begin(state->f);
+ }
- if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
- &source, flags))
+ if (!stream_obj_to_pack(state, &ctx, &already_hashed_to,
+ &source, OBJ_BLOB, flags))
break;
- truncate_checkpoint(state, &checkpoint, idx);
- if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1)
+ /*
+ * Writing this object to the current pack will make
-: ---------- > 3: d5bbd7810e bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source`
6: 4e9bac5bc1 = 4: e427fe6ad3 bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
8: 8667b76365 ! 5: 48095afe80 bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
@@ Commit message
objects individually as loose.
Similar to the existing `index_blob_bulk_checkin()` function, the
- entrypoint delegates to `deflate_blob_to_pack_incore()`, which is
- responsible for formatting the pack header and then deflating the
- contents into the pack. The latter is accomplished by calling
- deflate_obj_contents_to_pack_incore(), which takes advantage of the
- earlier refactorings and is responsible for writing the object to the
- pack and handling any overage from pack.packSizeLimit.
-
- The bulk of the new functionality is implemented in the function
- `stream_obj_to_pack()`, which can handle streaming objects from memory
- to the bulk-checkin pack as a result of the earlier refactoring.
+ entrypoint delegates to `deflate_obj_to_pack_incore()`. That function in
+ turn delegates to deflate_obj_to_pack(), which is responsible for
+ formatting the pack header and then deflating the contents into the
+ pack.
Consistent with the rest of the bulk-checkin mechanism, there are no
direct tests here. In future commits when we expose this new
@@ Commit message
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## bulk-checkin.c ##
-@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *state,
- }
+@@ bulk-checkin.c: static int deflate_obj_to_pack(struct bulk_checkin_packfile *state,
+ return 0;
}
-+static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
-+ git_hash_ctx *ctx,
-+ struct hashfile_checkpoint *checkpoint,
-+ struct object_id *result_oid,
-+ const void *buf, size_t size,
-+ enum object_type type,
-+ const char *path, unsigned flags)
++static int deflate_obj_to_pack_incore(struct bulk_checkin_packfile *state,
++ struct object_id *result_oid,
++ const void *buf, size_t size,
++ const char *path, enum object_type type,
++ unsigned flags)
+{
-+ struct pack_idx_entry *idx = NULL;
-+ off_t already_hashed_to = 0;
+ struct bulk_checkin_source source = {
+ .type = SOURCE_INCORE,
+ .buf = buf,
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+ .path = path,
+ };
+
-+ /* Note: idx is non-NULL when we are writing */
-+ if (flags & HASH_WRITE_OBJECT)
-+ CALLOC_ARRAY(idx, 1);
-+
-+ while (1) {
-+ prepare_checkpoint(state, checkpoint, idx, flags);
-+
-+ if (!stream_obj_to_pack(state, ctx, &already_hashed_to, &source,
-+ type, flags))
-+ break;
-+ truncate_checkpoint(state, checkpoint, idx);
-+ bulk_checkin_source_seek_to(&source, 0);
-+ }
-+
-+ finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
-+
-+ return 0;
-+}
-+
-+static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
-+ struct object_id *result_oid,
-+ const void *buf, size_t size,
-+ const char *path, unsigned flags)
-+{
-+ git_hash_ctx ctx;
-+ struct hashfile_checkpoint checkpoint = {0};
-+
-+ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
-+ size);
-+
-+ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
-+ result_oid, buf, size,
-+ OBJ_BLOB, path, flags);
++ return deflate_obj_to_pack(state, result_oid, &source, type, 0, flags);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
@@ bulk-checkin.c: int index_blob_bulk_checkin(struct object_id *oid,
+ const void *buf, size_t size,
+ const char *path, unsigned flags)
+{
-+ int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid,
-+ buf, size, path, flags);
++ int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid,
++ buf, size, path, OBJ_BLOB,
++ flags);
+ if (!odb_transaction_nesting)
+ flush_bulk_checkin_packfile(&bulk_checkin_packfile);
+ return status;
9: cba043ef14 ! 6: 60568f9281 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
@@ Commit message
machinery will have enough to keep track of the converted object's hash
in order to update the compatibility mapping.
- Within `deflate_tree_to_pack_incore()`, the changes should be limited
- to something like:
+ Within some thin wrapper around `deflate_obj_to_pack_incore()` (perhaps
+ `deflate_tree_to_pack_incore()`), the changes should be limited to
+ something like:
struct strbuf converted = STRBUF_INIT;
if (the_repository->compat_hash_algo) {
@@ Commit message
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## bulk-checkin.c ##
-@@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
- OBJ_BLOB, path, flags);
- }
-
-+static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
-+ struct object_id *result_oid,
-+ const void *buf, size_t size,
-+ const char *path, unsigned flags)
-+{
-+ git_hash_ctx ctx;
-+ struct hashfile_checkpoint checkpoint = {0};
-+
-+ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
-+ size);
-+
-+ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
-+ result_oid, buf, size,
-+ OBJ_TREE, path, flags);
-+}
-+
- static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
- struct object_id *result_oid,
- int fd, size_t size,
@@ bulk-checkin.c: int index_blob_bulk_checkin_incore(struct object_id *oid,
return status;
}
@@ bulk-checkin.c: int index_blob_bulk_checkin_incore(struct object_id *oid,
+ const void *buf, size_t size,
+ const char *path, unsigned flags)
+{
-+ int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid,
-+ buf, size, path, flags);
++ int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid,
++ buf, size, path, OBJ_TREE,
++ flags);
+ if (!odb_transaction_nesting)
+ flush_bulk_checkin_packfile(&bulk_checkin_packfile);
+ return status;
10: ae70508037 = 7: b9be9df122 builtin/merge-tree.c: implement support for `--write-pack`
--
2.42.0.405.g86fe3250c2
next reply other threads:[~2023-10-19 17:28 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-19 17:28 Taylor Blau [this message]
2023-10-19 17:28 ` [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-20 7:35 ` Jeff King
2023-10-20 16:55 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-19 17:28 ` [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-23 9:19 ` Patrick Steinhardt
2023-10-23 18:58 ` Jeff King
2023-10-24 6:34 ` Patrick Steinhardt
2023-10-24 17:08 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:29 ` [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-19 21:47 ` [PATCH v4 0/7] merge-ort: implement support for packing objects together Junio C Hamano
2023-10-20 7:29 ` Jeff King
2023-10-20 16:53 ` Junio C Hamano
2023-10-23 9:19 ` Patrick Steinhardt
2023-10-23 22:44 ` [PATCH v5 0/5] " Taylor Blau
2023-10-23 22:44 ` [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-25 7:37 ` Jeff King
2023-10-25 15:39 ` Taylor Blau
2023-10-27 23:12 ` Junio C Hamano
2023-10-23 22:44 ` [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-23 22:45 ` [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt
2023-10-25 15:44 ` Taylor Blau
2023-10-25 17:21 ` Eric Sunshine
2023-10-26 8:16 ` Patrick Steinhardt
2023-11-11 0:17 ` Elijah Newren
2023-10-23 22:45 ` [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-23 22:45 ` [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt
2023-10-25 15:46 ` Taylor Blau
2023-11-10 23:51 ` Elijah Newren
2023-11-11 0:27 ` Junio C Hamano
2023-11-11 1:34 ` Taylor Blau
2023-11-11 1:24 ` Taylor Blau
2023-11-13 22:05 ` Jeff King
2023-11-14 1:40 ` Junio C Hamano
2023-11-14 2:54 ` Elijah Newren
2023-11-14 21:55 ` Jeff King
2023-11-14 3:08 ` Elijah Newren
2023-11-13 22:02 ` Jeff King
2023-11-13 22:34 ` Taylor Blau
2023-11-14 2:50 ` Elijah Newren
2023-11-14 21:53 ` Jeff King
2023-11-14 22:04 ` Jeff King
2023-10-23 23:31 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Junio C Hamano
2023-11-06 15:46 ` Johannes Schindelin
2023-11-06 23:19 ` Junio C Hamano
2023-11-07 3:42 ` Jeff King
2023-11-07 15:58 ` Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 1/3] merge-ort.c: finalize ODB transactions after each step Taylor Blau
2023-11-11 3:45 ` Elijah Newren
2023-11-07 18:22 ` [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()` Taylor Blau
2023-11-08 7:05 ` Patrick Steinhardt
2023-11-09 19:26 ` Taylor Blau
2023-11-07 18:23 ` [RFC PATCH 3/3] builtin/replay.c: introduce `--write-pack` Taylor Blau
2023-11-11 3:42 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Elijah Newren
2023-11-11 4:04 ` Elijah Newren
2023-10-25 7:58 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1697736516.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).