From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
"Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v4 0/7] merge-ort: implement support for packing objects together
Date: Mon, 23 Oct 2023 11:19:19 +0200 [thread overview]
Message-ID: <ZTY6lwtMgtRwmMrB@tanuki> (raw)
In-Reply-To: <cover.1697736516.git.me@ttaylorr.com>
[-- Attachment #1: Type: text/plain, Size: 13253 bytes --]
On Thu, Oct 19, 2023 at 01:28:38PM -0400, Taylor Blau wrote:
> (Rebased onto the current tip of 'master', which is 813d9a9188 (The
> nineteenth batch, 2023-10-18) at the time of writing).
>
> This series implements support for a new merge-tree option,
> `--write-pack`, which causes any newly-written objects to be packed
> together instead of being stored individually as loose.
>
> I intentionally broke this off from the existing thread, since I
> accidentally rerolled mine and Jonathan Tan's Bloom v2 series into it,
> causing some confusion.
>
> This is a new round that is significantly simplified thanks to
> another very helpful suggestion[1] from Junio. By factoring out a common
> "deflate object to pack" that takes an abstract bulk_checkin_source as a
> parameter, all of the earlier refactorings can be dropped since we
> retain only a single caller instead of multiple.
>
> This resulted in a rather satisfying range-diff (included below, as
> usual), and a similarly satisfying inter-diff:
>
> $ git diff --stat tb/ort-bulk-checkin.v3..
> bulk-checkin.c | 203 ++++++++++++++++---------------------------------
> 1 file changed, 64 insertions(+), 139 deletions(-)
>
> Beyond that, the changes since last time can be viewed in the range-diff
> below. Thanks in advance for any review!
>
> [1]: https://lore.kernel.org/git/xmqq34y7plj4.fsf@gitster.g/
A single question regarding an `assert()` from my side. Other than that
the series looks good to me, thanks.
Patrick
> Taylor Blau (7):
> bulk-checkin: extract abstract `bulk_checkin_source`
> bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
> bulk-checkin: refactor deflate routine to accept a
> `bulk_checkin_source`
> bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
> bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
> bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
> builtin/merge-tree.c: implement support for `--write-pack`
>
> Documentation/git-merge-tree.txt | 4 +
> builtin/merge-tree.c | 5 +
> bulk-checkin.c | 161 ++++++++++++++++++++++++++-----
> bulk-checkin.h | 8 ++
> merge-ort.c | 42 ++++++--
> merge-recursive.h | 1 +
> t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++
> 7 files changed, 280 insertions(+), 34 deletions(-)
>
> Range-diff against v3:
> 1: 2dffa45183 < -: ---------- bulk-checkin: factor out `format_object_header_hash()`
> 2: 7a10dc794a < -: ---------- bulk-checkin: factor out `prepare_checkpoint()`
> 3: 20c32d2178 < -: ---------- bulk-checkin: factor out `truncate_checkpoint()`
> 4: 893051d0b7 < -: ---------- bulk-checkin: factor out `finalize_checkpoint()`
> 5: da52ec8380 ! 1: 97bb6e9f59 bulk-checkin: extract abstract `bulk_checkin_source`
> @@ bulk-checkin.c: static int stream_blob_to_pack(struct bulk_checkin_packfile *sta
> if (*already_hashed_to < offset) {
> size_t hsize = offset - *already_hashed_to;
> @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> - git_hash_ctx ctx;
> + unsigned header_len;
> struct hashfile_checkpoint checkpoint = {0};
> struct pack_idx_entry *idx = NULL;
> + struct bulk_checkin_source source = {
> @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *st
> seekback = lseek(fd, 0, SEEK_CUR);
> if (seekback == (off_t) -1)
> @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> - while (1) {
> - prepare_checkpoint(state, &checkpoint, idx, flags);
> + crc32_begin(state->f);
> + }
> if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
> - fd, size, path, flags))
> + &source, flags))
> break;
> - truncate_checkpoint(state, &checkpoint, idx);
> + /*
> + * Writing this object to the current pack will make
> +@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> + hashfile_truncate(state->f, &checkpoint);
> + state->offset = checkpoint.offset;
> + flush_bulk_checkin_packfile(state);
> - if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
> + if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1)
> return error("cannot seek back");
> }
> - finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid);
> + the_hash_algo->final_oid_fn(result_oid, &ctx);
> 7: 04ec74e357 ! 2: 9d633df339 bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
> @@ bulk-checkin.c: static int stream_blob_to_pack(struct bulk_checkin_packfile *sta
> s.avail_out = sizeof(obuf) - hdrlen;
>
> @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> -
> - while (1) {
> - prepare_checkpoint(state, &checkpoint, idx, flags);
> + idx->offset = state->offset;
> + crc32_begin(state->f);
> + }
> - if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
> - &source, flags))
> + if (!stream_obj_to_pack(state, &ctx, &already_hashed_to,
> + &source, OBJ_BLOB, flags))
> break;
> - truncate_checkpoint(state, &checkpoint, idx);
> - if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1)
> + /*
> + * Writing this object to the current pack will make
> -: ---------- > 3: d5bbd7810e bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source`
> 6: 4e9bac5bc1 = 4: e427fe6ad3 bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
> 8: 8667b76365 ! 5: 48095afe80 bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
> @@ Commit message
> objects individually as loose.
>
> Similar to the existing `index_blob_bulk_checkin()` function, the
> - entrypoint delegates to `deflate_blob_to_pack_incore()`, which is
> - responsible for formatting the pack header and then deflating the
> - contents into the pack. The latter is accomplished by calling
> - deflate_obj_contents_to_pack_incore(), which takes advantage of the
> - earlier refactorings and is responsible for writing the object to the
> - pack and handling any overage from pack.packSizeLimit.
> -
> - The bulk of the new functionality is implemented in the function
> - `stream_obj_to_pack()`, which can handle streaming objects from memory
> - to the bulk-checkin pack as a result of the earlier refactoring.
> + entrypoint delegates to `deflate_obj_to_pack_incore()`. That function in
> + turn delegates to deflate_obj_to_pack(), which is responsible for
> + formatting the pack header and then deflating the contents into the
> + pack.
>
> Consistent with the rest of the bulk-checkin mechanism, there are no
> direct tests here. In future commits when we expose this new
> @@ Commit message
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## bulk-checkin.c ##
> -@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *state,
> - }
> +@@ bulk-checkin.c: static int deflate_obj_to_pack(struct bulk_checkin_packfile *state,
> + return 0;
> }
>
> -+static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
> -+ git_hash_ctx *ctx,
> -+ struct hashfile_checkpoint *checkpoint,
> -+ struct object_id *result_oid,
> -+ const void *buf, size_t size,
> -+ enum object_type type,
> -+ const char *path, unsigned flags)
> ++static int deflate_obj_to_pack_incore(struct bulk_checkin_packfile *state,
> ++ struct object_id *result_oid,
> ++ const void *buf, size_t size,
> ++ const char *path, enum object_type type,
> ++ unsigned flags)
> +{
> -+ struct pack_idx_entry *idx = NULL;
> -+ off_t already_hashed_to = 0;
> + struct bulk_checkin_source source = {
> + .type = SOURCE_INCORE,
> + .buf = buf,
> @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
> + .path = path,
> + };
> +
> -+ /* Note: idx is non-NULL when we are writing */
> -+ if (flags & HASH_WRITE_OBJECT)
> -+ CALLOC_ARRAY(idx, 1);
> -+
> -+ while (1) {
> -+ prepare_checkpoint(state, checkpoint, idx, flags);
> -+
> -+ if (!stream_obj_to_pack(state, ctx, &already_hashed_to, &source,
> -+ type, flags))
> -+ break;
> -+ truncate_checkpoint(state, checkpoint, idx);
> -+ bulk_checkin_source_seek_to(&source, 0);
> -+ }
> -+
> -+ finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
> -+
> -+ return 0;
> -+}
> -+
> -+static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
> -+ struct object_id *result_oid,
> -+ const void *buf, size_t size,
> -+ const char *path, unsigned flags)
> -+{
> -+ git_hash_ctx ctx;
> -+ struct hashfile_checkpoint checkpoint = {0};
> -+
> -+ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
> -+ size);
> -+
> -+ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
> -+ result_oid, buf, size,
> -+ OBJ_BLOB, path, flags);
> ++ return deflate_obj_to_pack(state, result_oid, &source, type, 0, flags);
> +}
> +
> static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> @@ bulk-checkin.c: int index_blob_bulk_checkin(struct object_id *oid,
> + const void *buf, size_t size,
> + const char *path, unsigned flags)
> +{
> -+ int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid,
> -+ buf, size, path, flags);
> ++ int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid,
> ++ buf, size, path, OBJ_BLOB,
> ++ flags);
> + if (!odb_transaction_nesting)
> + flush_bulk_checkin_packfile(&bulk_checkin_packfile);
> + return status;
> 9: cba043ef14 ! 6: 60568f9281 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
> @@ Commit message
> machinery will have enough to keep track of the converted object's hash
> in order to update the compatibility mapping.
>
> - Within `deflate_tree_to_pack_incore()`, the changes should be limited
> - to something like:
> + Within some thin wrapper around `deflate_obj_to_pack_incore()` (perhaps
> + `deflate_tree_to_pack_incore()`), the changes should be limited to
> + something like:
>
> struct strbuf converted = STRBUF_INIT;
> if (the_repository->compat_hash_algo) {
> @@ Commit message
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## bulk-checkin.c ##
> -@@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
> - OBJ_BLOB, path, flags);
> - }
> -
> -+static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
> -+ struct object_id *result_oid,
> -+ const void *buf, size_t size,
> -+ const char *path, unsigned flags)
> -+{
> -+ git_hash_ctx ctx;
> -+ struct hashfile_checkpoint checkpoint = {0};
> -+
> -+ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
> -+ size);
> -+
> -+ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
> -+ result_oid, buf, size,
> -+ OBJ_TREE, path, flags);
> -+}
> -+
> - static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> - struct object_id *result_oid,
> - int fd, size_t size,
> @@ bulk-checkin.c: int index_blob_bulk_checkin_incore(struct object_id *oid,
> return status;
> }
> @@ bulk-checkin.c: int index_blob_bulk_checkin_incore(struct object_id *oid,
> + const void *buf, size_t size,
> + const char *path, unsigned flags)
> +{
> -+ int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid,
> -+ buf, size, path, flags);
> ++ int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid,
> ++ buf, size, path, OBJ_TREE,
> ++ flags);
> + if (!odb_transaction_nesting)
> + flush_bulk_checkin_packfile(&bulk_checkin_packfile);
> + return status;
> 10: ae70508037 = 7: b9be9df122 builtin/merge-tree.c: implement support for `--write-pack`
> --
> 2.42.0.405.g86fe3250c2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-10-23 9:19 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-19 17:28 [PATCH v4 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-19 17:28 ` [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-20 7:35 ` Jeff King
2023-10-20 16:55 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-19 17:28 ` [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-23 9:19 ` Patrick Steinhardt
2023-10-23 18:58 ` Jeff King
2023-10-24 6:34 ` Patrick Steinhardt
2023-10-24 17:08 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:29 ` [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-19 21:47 ` [PATCH v4 0/7] merge-ort: implement support for packing objects together Junio C Hamano
2023-10-20 7:29 ` Jeff King
2023-10-20 16:53 ` Junio C Hamano
2023-10-23 9:19 ` Patrick Steinhardt [this message]
2023-10-23 22:44 ` [PATCH v5 0/5] " Taylor Blau
2023-10-23 22:44 ` [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-25 7:37 ` Jeff King
2023-10-25 15:39 ` Taylor Blau
2023-10-27 23:12 ` Junio C Hamano
2023-10-23 22:44 ` [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-23 22:45 ` [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt
2023-10-25 15:44 ` Taylor Blau
2023-10-25 17:21 ` Eric Sunshine
2023-10-26 8:16 ` Patrick Steinhardt
2023-11-11 0:17 ` Elijah Newren
2023-10-23 22:45 ` [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-23 22:45 ` [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt
2023-10-25 15:46 ` Taylor Blau
2023-11-10 23:51 ` Elijah Newren
2023-11-11 0:27 ` Junio C Hamano
2023-11-11 1:34 ` Taylor Blau
2023-11-11 1:24 ` Taylor Blau
2023-11-13 22:05 ` Jeff King
2023-11-14 1:40 ` Junio C Hamano
2023-11-14 2:54 ` Elijah Newren
2023-11-14 21:55 ` Jeff King
2023-11-14 3:08 ` Elijah Newren
2023-11-13 22:02 ` Jeff King
2023-11-13 22:34 ` Taylor Blau
2023-11-14 2:50 ` Elijah Newren
2023-11-14 21:53 ` Jeff King
2023-11-14 22:04 ` Jeff King
2023-10-23 23:31 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Junio C Hamano
2023-11-06 15:46 ` Johannes Schindelin
2023-11-06 23:19 ` Junio C Hamano
2023-11-07 3:42 ` Jeff King
2023-11-07 15:58 ` Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 1/3] merge-ort.c: finalize ODB transactions after each step Taylor Blau
2023-11-11 3:45 ` Elijah Newren
2023-11-07 18:22 ` [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()` Taylor Blau
2023-11-08 7:05 ` Patrick Steinhardt
2023-11-09 19:26 ` Taylor Blau
2023-11-07 18:23 ` [RFC PATCH 3/3] builtin/replay.c: introduce `--write-pack` Taylor Blau
2023-11-11 3:42 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Elijah Newren
2023-11-11 4:04 ` Elijah Newren
2023-10-25 7:58 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZTY6lwtMgtRwmMrB@tanuki \
--to=ps@pks.im \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).