From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
"Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v3 00/10] merge-ort: implement support for packing objects together
Date: Wed, 18 Oct 2023 13:07:45 -0400 [thread overview]
Message-ID: <cover.1697648864.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>
This series implements support for a new merge-tree option,
`--write-pack`, which causes any newly-written objects to be packed
together instead of being stored individually as loose.
The notable change from last time is in response to a suggestion[1] from
Junio to factor out an abstract bulk-checkin "source", which ended up
reducing the duplication between a couple of functions in the earlier
round by a significant degree.
Beyond that, the changes since last time can be viewed in the range-diff
below. Thanks in advance for any review!
[1]: https://lore.kernel.org/git/xmqq5y34wu5f.fsf@gitster.g/
Taylor Blau (10):
bulk-checkin: factor out `format_object_header_hash()`
bulk-checkin: factor out `prepare_checkpoint()`
bulk-checkin: factor out `truncate_checkpoint()`
bulk-checkin: factor out `finalize_checkpoint()`
bulk-checkin: extract abstract `bulk_checkin_source`
bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
builtin/merge-tree.c: implement support for `--write-pack`
Documentation/git-merge-tree.txt | 4 +
builtin/merge-tree.c | 5 +
bulk-checkin.c | 288 +++++++++++++++++++++++++------
bulk-checkin.h | 8 +
merge-ort.c | 42 ++++-
merge-recursive.h | 1 +
t/t4301-merge-tree-write-tree.sh | 93 ++++++++++
7 files changed, 381 insertions(+), 60 deletions(-)
Range-diff against v2:
1: edf1cbafc1 = 1: 2dffa45183 bulk-checkin: factor out `format_object_header_hash()`
2: b3f89d5853 = 2: 7a10dc794a bulk-checkin: factor out `prepare_checkpoint()`
3: abe4fb0a59 = 3: 20c32d2178 bulk-checkin: factor out `truncate_checkpoint()`
4: 0b855a6eb7 ! 4: 893051d0b7 bulk-checkin: factor our `finalize_checkpoint()`
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- bulk-checkin: factor our `finalize_checkpoint()`
+ bulk-checkin: factor out `finalize_checkpoint()`
In a similar spirit as previous commits, factor out the routine to
finalize the just-written object from the bulk-checkin mechanism.
-: ---------- > 5: da52ec8380 bulk-checkin: extract abstract `bulk_checkin_source`
-: ---------- > 6: 4e9bac5bc1 bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
-: ---------- > 7: 04ec74e357 bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
5: 239bf39bfb ! 8: 8667b76365 bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
@@ Commit message
entrypoint delegates to `deflate_blob_to_pack_incore()`, which is
responsible for formatting the pack header and then deflating the
contents into the pack. The latter is accomplished by calling
- deflate_blob_contents_to_pack_incore(), which takes advantage of the
- earlier refactoring and is responsible for writing the object to the
+ deflate_obj_contents_to_pack_incore(), which takes advantage of the
+ earlier refactorings and is responsible for writing the object to the
pack and handling any overage from pack.packSizeLimit.
The bulk of the new functionality is implemented in the function
- `stream_obj_to_pack_incore()`, which is a generic implementation for
- writing objects of arbitrary type (whose contents we can fit in-core)
- into a bulk-checkin pack.
-
- The new function shares an unfortunate degree of similarity to the
- existing `stream_blob_to_pack()` function. But DRY-ing up these two
- would likely be more trouble than it's worth, since the latter has to
- deal with reading and writing the contents of the object.
+ `stream_obj_to_pack()`, which can handle streaming objects from memory
+ to the bulk-checkin pack as a result of the earlier refactoring.
Consistent with the rest of the bulk-checkin mechanism, there are no
direct tests here. In future commits when we expose this new
@@ Commit message
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## bulk-checkin.c ##
-@@ bulk-checkin.c: static int already_written(struct bulk_checkin_packfile *state, struct object_id
- return 0;
- }
-
-+static int stream_obj_to_pack_incore(struct bulk_checkin_packfile *state,
-+ git_hash_ctx *ctx,
-+ off_t *already_hashed_to,
-+ const void *buf, size_t size,
-+ enum object_type type,
-+ const char *path, unsigned flags)
-+{
-+ git_zstream s;
-+ unsigned char obuf[16384];
-+ unsigned hdrlen;
-+ int status = Z_OK;
-+ int write_object = (flags & HASH_WRITE_OBJECT);
-+
-+ git_deflate_init(&s, pack_compression_level);
-+
-+ hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size);
-+ s.next_out = obuf + hdrlen;
-+ s.avail_out = sizeof(obuf) - hdrlen;
-+
-+ if (*already_hashed_to < size) {
-+ size_t hsize = size - *already_hashed_to;
-+ if (hsize) {
-+ the_hash_algo->update_fn(ctx, buf, hsize);
-+ }
-+ *already_hashed_to = size;
-+ }
-+ s.next_in = (void *)buf;
-+ s.avail_in = size;
-+
-+ while (status != Z_STREAM_END) {
-+ status = git_deflate(&s, Z_FINISH);
-+ if (!s.avail_out || status == Z_STREAM_END) {
-+ if (write_object) {
-+ size_t written = s.next_out - obuf;
-+
-+ /* would we bust the size limit? */
-+ if (state->nr_written &&
-+ pack_size_limit_cfg &&
-+ pack_size_limit_cfg < state->offset + written) {
-+ git_deflate_abort(&s);
-+ return -1;
-+ }
-+
-+ hashwrite(state->f, obuf, written);
-+ state->offset += written;
-+ }
-+ s.next_out = obuf;
-+ s.avail_out = sizeof(obuf);
-+ }
-+
-+ switch (status) {
-+ case Z_OK:
-+ case Z_BUF_ERROR:
-+ case Z_STREAM_END:
-+ continue;
-+ default:
-+ die("unexpected deflate failure: %d", status);
-+ }
-+ }
-+ git_deflate_end(&s);
-+ return 0;
-+}
-+
- /*
- * Read the contents from fd for size bytes, streaming it to the
- * packfile in state while updating the hash in ctx. Signal a failure
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *state,
}
}
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+{
+ struct pack_idx_entry *idx = NULL;
+ off_t already_hashed_to = 0;
++ struct bulk_checkin_source source = {
++ .type = SOURCE_INCORE,
++ .buf = buf,
++ .size = size,
++ .read = 0,
++ .path = path,
++ };
+
+ /* Note: idx is non-NULL when we are writing */
+ if (flags & HASH_WRITE_OBJECT)
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+
+ while (1) {
+ prepare_checkpoint(state, checkpoint, idx, flags);
-+ if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
-+ buf, size, type, path, flags))
++
++ if (!stream_obj_to_pack(state, ctx, &already_hashed_to, &source,
++ type, flags))
+ break;
+ truncate_checkpoint(state, checkpoint, idx);
++ bulk_checkin_source_seek_to(&source, 0);
+ }
+
+ finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
6: 57613807d8 = 9: cba043ef14 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
7: f21400f56c = 10: ae70508037 builtin/merge-tree.c: implement support for `--write-pack`
--
2.42.0.408.g97fac66ae4
next prev parent reply other threads:[~2023-10-18 17:10 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07 3:07 ` Eric Biederman
2023-10-09 1:31 ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35 ` Junio C Hamano
2023-10-06 23:02 ` Taylor Blau
2023-10-08 7:02 ` Elijah Newren
2023-10-08 16:04 ` Taylor Blau
2023-10-08 17:33 ` Jeff King
2023-10-09 1:37 ` Taylor Blau
2023-10-09 20:21 ` Jeff King
2023-10-09 17:24 ` Junio C Hamano
2023-10-09 10:54 ` Patrick Steinhardt
2023-10-09 16:08 ` Taylor Blau
2023-10-10 6:36 ` Patrick Steinhardt
2023-10-17 16:31 ` [PATCH v2 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-17 16:31 ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 2:18 ` Junio C Hamano
2023-10-18 16:34 ` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` Taylor Blau [this message]
2023-10-18 17:07 ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10 ` Junio C Hamano
2023-10-19 15:19 ` Taylor Blau
2023-10-19 17:55 ` Junio C Hamano
2023-10-18 17:08 ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08 ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18 ` Junio C Hamano
2023-10-19 15:30 ` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-18 18:32 ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32 ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32 ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32 ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32 ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32 ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32 ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32 ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33 ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33 ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33 ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33 ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33 ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33 ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33 ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27 ` Taylor Blau
2023-10-23 20:22 ` SZEDER Gábor
2023-10-30 20:24 ` Taylor Blau
2024-01-16 22:08 ` [PATCH v5 " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09 ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09 ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09 ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09 ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09 ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09 ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26 ` SZEDER Gábor
2024-01-29 23:58 ` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09 ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09 ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09 ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09 ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09 ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1697648864.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.