From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
"Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 0/7] merge-ort: implement support for packing objects together
Date: Tue, 17 Oct 2023 12:31:08 -0400 [thread overview]
Message-ID: <cover.1697560266.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>
(Previously based on 'eb/limit-bulk-checkin-to-blobs', which has since
been merged. This series is now based on the tip of 'master', which is
a9ecda2788 (The eighteenth batch, 2023-10-13) at the time of writing).
This series implements support for a new merge-tree option,
`--write-pack`, which causes any newly-written objects to be packed
together instead of being stored individually as loose.
Much is unchanged since last time, except for a small tweak to one of
the commit messages in response to feedback from Eric W. Biederman. The
series has also been rebased onto 'master', which had a couple of
conflicts that I resolved pertaining to:
- 9eb5419799 (bulk-checkin: only support blobs in index_bulk_checkin,
2023-09-26)
- e0b8c84240 (treewide: fix various bugs w/ OpenSSL 3+ EVP API,
2023-09-01)
They were mostly trivial resolutions, and the results can be viewed in
the range-diff included below.
(From last time: the motivating use-case behind these changes is to
better support repositories who invoke merge-tree frequently, generating
a potentially large number of loose objects, resulting in a possible
adverse effect on performance.)
Thanks in advance for any review!
Taylor Blau (7):
bulk-checkin: factor out `format_object_header_hash()`
bulk-checkin: factor out `prepare_checkpoint()`
bulk-checkin: factor out `truncate_checkpoint()`
bulk-checkin: factor our `finalize_checkpoint()`
bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
builtin/merge-tree.c: implement support for `--write-pack`
Documentation/git-merge-tree.txt | 4 +
builtin/merge-tree.c | 5 +
bulk-checkin.c | 258 ++++++++++++++++++++++++++-----
bulk-checkin.h | 8 +
merge-ort.c | 42 +++--
merge-recursive.h | 1 +
t/t4301-merge-tree-write-tree.sh | 93 +++++++++++
7 files changed, 363 insertions(+), 48 deletions(-)
Range-diff against v1:
1: 37f4072815 ! 1: edf1cbafc1 bulk-checkin: factor out `format_object_header_hash()`
@@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
}
+static void format_object_header_hash(const struct git_hash_algo *algop,
-+ git_hash_ctx *ctx, enum object_type type,
++ git_hash_ctx *ctx,
++ struct hashfile_checkpoint *checkpoint,
++ enum object_type type,
+ size_t size)
+{
+ unsigned char header[16384];
@@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
+
+ algop->init_fn(ctx);
+ algop->update_fn(ctx, header, header_len);
++ algop->init_fn(&checkpoint->ctx);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *st
- OBJ_BLOB, size);
- the_hash_algo->init_fn(&ctx);
- the_hash_algo->update_fn(&ctx, obuf, header_len);
-+ format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
+- the_hash_algo->init_fn(&checkpoint.ctx);
++ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
++ size);
/* Note: idx is non-NULL when we are writing */
if ((flags & HASH_WRITE_OBJECT) != 0)
2: 9cc1f3014a ! 2: b3f89d5853 bulk-checkin: factor out `prepare_checkpoint()`
@@ Commit message
## bulk-checkin.c ##
@@ bulk-checkin.c: static void format_object_header_hash(const struct git_hash_algo *algop,
- algop->update_fn(ctx, header, header_len);
+ algop->init_fn(&checkpoint->ctx);
}
+static void prepare_checkpoint(struct bulk_checkin_packfile *state,
3: f392ed2211 = 3: abe4fb0a59 bulk-checkin: factor out `truncate_checkpoint()`
4: 9c6ca564ad = 4: 0b855a6eb7 bulk-checkin: factor our `finalize_checkpoint()`
5: 30ca7334c7 ! 5: 239bf39bfb bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
+ git_hash_ctx *ctx,
++ struct hashfile_checkpoint *checkpoint,
+ struct object_id *result_oid,
+ const void *buf, size_t size,
+ enum object_type type,
+ const char *path, unsigned flags)
+{
-+ struct hashfile_checkpoint checkpoint = {0};
+ struct pack_idx_entry *idx = NULL;
+ off_t already_hashed_to = 0;
+
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+ CALLOC_ARRAY(idx, 1);
+
+ while (1) {
-+ prepare_checkpoint(state, &checkpoint, idx, flags);
++ prepare_checkpoint(state, checkpoint, idx, flags);
+ if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
+ buf, size, type, path, flags))
+ break;
-+ truncate_checkpoint(state, &checkpoint, idx);
++ truncate_checkpoint(state, checkpoint, idx);
+ }
+
-+ finalize_checkpoint(state, ctx, &checkpoint, idx, result_oid);
++ finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
+
+ return 0;
+}
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+ const char *path, unsigned flags)
+{
+ git_hash_ctx ctx;
++ struct hashfile_checkpoint checkpoint = {0};
+
-+ format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
++ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
++ size);
+
-+ return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
-+ buf, size, OBJ_BLOB, path,
-+ flags);
++ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
++ result_oid, buf, size,
++ OBJ_BLOB, path, flags);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
6: cb0f79cabb ! 6: 57613807d8 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
@@ Commit message
Within `deflate_tree_to_pack_incore()`, the changes should be limited
to something like:
+ struct strbuf converted = STRBUF_INIT;
if (the_repository->compat_hash_algo) {
- struct strbuf converted = STRBUF_INIT;
if (convert_object_file(&compat_obj,
the_repository->hash_algo,
the_repository->compat_hash_algo, ...) < 0)
@@ Commit message
format_object_header_hash(the_repository->compat_hash_algo,
OBJ_TREE, size);
-
- strbuf_release(&converted);
}
+ /* compute the converted tree's hash using the compat algorithm */
+ strbuf_release(&converted);
, assuming related changes throughout the rest of the bulk-checkin
machinery necessary to update the hash of the converted object, which
@@ Commit message
## bulk-checkin.c ##
@@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
- flags);
+ OBJ_BLOB, path, flags);
}
+static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
@@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packf
+ const char *path, unsigned flags)
+{
+ git_hash_ctx ctx;
++ struct hashfile_checkpoint checkpoint = {0};
+
-+ format_object_header_hash(the_hash_algo, &ctx, OBJ_TREE, size);
++ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
++ size);
+
-+ return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
-+ buf, size, OBJ_TREE, path,
-+ flags);
++ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
++ result_oid, buf, size,
++ OBJ_TREE, path, flags);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
7: e969210145 ! 7: f21400f56c builtin/merge-tree.c: implement support for `--write-pack`
@@ merge-ort.c
* We have many arrays of size 3. Whenever we have such an array, the
@@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
if ((merge_status < 0) || !result_buf.ptr)
- ret = err(opt, _("Failed to execute internal merge"));
+ ret = error(_("failed to execute internal merge"));
- if (!ret &&
- write_object_file(result_buf.ptr, result_buf.size,
- OBJ_BLOB, &result->oid))
-- ret = err(opt, _("Unable to add %s to database"),
-- path);
+- ret = error(_("unable to add %s to database"), path);
+ if (!ret) {
+ ret = opt->write_pack
+ ? index_blob_bulk_checkin_incore(&result->oid,
@@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
+ result_buf.size,
+ OBJ_BLOB, &result->oid);
+ if (ret)
-+ ret = err(opt, _("Unable to add %s to database"),
-+ path);
++ ret = error(_("unable to add %s to database"),
++ path);
+ }
free(result_buf.ptr);
--
2.42.0.405.gdb2a2f287e
next prev parent reply other threads:[~2023-10-17 16:31 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07 3:07 ` Eric Biederman
2023-10-09 1:31 ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35 ` Junio C Hamano
2023-10-06 23:02 ` Taylor Blau
2023-10-08 7:02 ` Elijah Newren
2023-10-08 16:04 ` Taylor Blau
2023-10-08 17:33 ` Jeff King
2023-10-09 1:37 ` Taylor Blau
2023-10-09 20:21 ` Jeff King
2023-10-09 17:24 ` Junio C Hamano
2023-10-09 10:54 ` Patrick Steinhardt
2023-10-09 16:08 ` Taylor Blau
2023-10-10 6:36 ` Patrick Steinhardt
2023-10-17 16:31 ` Taylor Blau [this message]
2023-10-17 16:31 ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 2:18 ` Junio C Hamano
2023-10-18 16:34 ` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 00/10] merge-ort: implement support for packing objects together Taylor Blau
2023-10-18 17:07 ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10 ` Junio C Hamano
2023-10-19 15:19 ` Taylor Blau
2023-10-19 17:55 ` Junio C Hamano
2023-10-18 17:08 ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08 ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18 ` Junio C Hamano
2023-10-19 15:30 ` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-18 18:32 ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32 ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32 ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32 ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32 ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32 ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32 ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32 ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33 ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33 ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33 ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33 ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33 ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33 ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33 ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27 ` Taylor Blau
2023-10-23 20:22 ` SZEDER Gábor
2023-10-30 20:24 ` Taylor Blau
2024-01-16 22:08 ` [PATCH v5 " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09 ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09 ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09 ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09 ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09 ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09 ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26 ` SZEDER Gábor
2024-01-29 23:58 ` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09 ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09 ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09 ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09 ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09 ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1697560266.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.