From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
"Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
Date: Wed, 25 Oct 2023 09:58:06 +0200 [thread overview]
Message-ID: <ZTjKjkRMkmCuxDU1@tanuki> (raw)
In-Reply-To: <d8cf8e4395375f88fe4e1ade2b79a3be6ce5fb12.1698101088.git.me@ttaylorr.com>
[-- Attachment #1: Type: text/plain, Size: 6245 bytes --]
On Mon, Oct 23, 2023 at 06:45:01PM -0400, Taylor Blau wrote:
> Introduce `index_blob_bulk_checkin_incore()` which allows streaming
> arbitrary blob contents from memory into the bulk-checkin pack.
>
> In order to support streaming from a location in memory, we must
> implement a new kind of bulk_checkin_source that does just that. These
> implementation in spread out across:
Nit: the commit message is a bit off here. Probably not worth a reroll
though.
> - init_bulk_checkin_source_incore()
> - bulk_checkin_source_read_incore()
> - bulk_checkin_source_seek_incore()
>
> Note that, unlike file descriptors, which manage their own offset
> internally, we have to keep track of how many bytes we've read out of
> the buffer, and make sure we don't read past the end of the buffer.
>
> This will be useful in a couple of more commits in order to provide the
> `merge-tree` builtin with a mechanism to create a new pack containing
> any objects it created during the merge, instead of storing those
> objects individually as loose.
>
> Similar to the existing `index_blob_bulk_checkin()` function, the
> entrypoint delegates to `deflate_obj_to_pack_incore()`. That function in
> turn delegates to deflate_obj_to_pack(), which is responsible for
> formatting the pack header and then deflating the contents into the
> pack.
>
> Consistent with the rest of the bulk-checkin mechanism, there are no
> direct tests here. In future commits when we expose this new
> functionality via the `merge-tree` builtin, we will test it indirectly
> there.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> bulk-checkin.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++
> bulk-checkin.h | 4 +++
> 2 files changed, 79 insertions(+)
>
> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index 79776e679e..b728210bc7 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -148,6 +148,10 @@ struct bulk_checkin_source {
> struct {
> int fd;
> } from_fd;
> + struct {
> + const void *buf;
> + size_t nr_read;
> + } incore;
> } data;
>
> size_t size;
> @@ -166,6 +170,36 @@ static off_t bulk_checkin_source_seek_from_fd(struct bulk_checkin_source *source
> return lseek(source->data.from_fd.fd, offset, SEEK_SET);
> }
>
> +static off_t bulk_checkin_source_read_incore(struct bulk_checkin_source *source,
> + void *buf, size_t nr)
> +{
> + const unsigned char *src = source->data.incore.buf;
> +
> + if (source->data.incore.nr_read > source->size)
> + BUG("read beyond bulk-checkin source buffer end "
> + "(%"PRIuMAX" > %"PRIuMAX")",
> + (uintmax_t)source->data.incore.nr_read,
> + (uintmax_t)source->size);
> +
> + if (nr > source->size - source->data.incore.nr_read)
> + nr = source->size - source->data.incore.nr_read;
> +
> + src += source->data.incore.nr_read;
> +
> + memcpy(buf, src, nr);
> + source->data.incore.nr_read += nr;
> + return nr;
> +}
> +
> +static off_t bulk_checkin_source_seek_incore(struct bulk_checkin_source *source,
> + off_t offset)
> +{
> + if (!(0 <= offset && offset < source->size))
> + return (off_t)-1;
At the risk of showing my own ignorance, but why is the cast here
necessary?
Patrick
> + source->data.incore.nr_read = offset;
> + return source->data.incore.nr_read;
> +}
> +
> static void init_bulk_checkin_source_from_fd(struct bulk_checkin_source *source,
> int fd, size_t size,
> const char *path)
> @@ -181,6 +215,22 @@ static void init_bulk_checkin_source_from_fd(struct bulk_checkin_source *source,
> source->path = path;
> }
>
> +static void init_bulk_checkin_source_incore(struct bulk_checkin_source *source,
> + const void *buf, size_t size,
> + const char *path)
> +{
> + memset(source, 0, sizeof(struct bulk_checkin_source));
> +
> + source->read = bulk_checkin_source_read_incore;
> + source->seek = bulk_checkin_source_seek_incore;
> +
> + source->data.incore.buf = buf;
> + source->data.incore.nr_read = 0;
> +
> + source->size = size;
> + source->path = path;
> +}
> +
> /*
> * Read the contents from 'source' for 'size' bytes, streaming it to the
> * packfile in state while updating the hash in ctx. Signal a failure
> @@ -359,6 +409,19 @@ static int deflate_obj_to_pack(struct bulk_checkin_packfile *state,
> return 0;
> }
>
> +static int deflate_obj_to_pack_incore(struct bulk_checkin_packfile *state,
> + struct object_id *result_oid,
> + const void *buf, size_t size,
> + const char *path, enum object_type type,
> + unsigned flags)
> +{
> + struct bulk_checkin_source source;
> +
> + init_bulk_checkin_source_incore(&source, buf, size, path);
> +
> + return deflate_obj_to_pack(state, result_oid, &source, type, 0, flags);
> +}
> +
> static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> struct object_id *result_oid,
> int fd, size_t size,
> @@ -421,6 +484,18 @@ int index_blob_bulk_checkin(struct object_id *oid,
> return status;
> }
>
> +int index_blob_bulk_checkin_incore(struct object_id *oid,
> + const void *buf, size_t size,
> + const char *path, unsigned flags)
> +{
> + int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid,
> + buf, size, path, OBJ_BLOB,
> + flags);
> + if (!odb_transaction_nesting)
> + flush_bulk_checkin_packfile(&bulk_checkin_packfile);
> + return status;
> +}
> +
> void begin_odb_transaction(void)
> {
> odb_transaction_nesting += 1;
> diff --git a/bulk-checkin.h b/bulk-checkin.h
> index aa7286a7b3..1b91daeaee 100644
> --- a/bulk-checkin.h
> +++ b/bulk-checkin.h
> @@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid,
> int fd, size_t size,
> const char *path, unsigned flags);
>
> +int index_blob_bulk_checkin_incore(struct object_id *oid,
> + const void *buf, size_t size,
> + const char *path, unsigned flags);
> +
> /*
> * Tell the object database to optimize for adding
> * multiple objects. end_odb_transaction must be called
> --
> 2.42.0.425.g963d08ddb3.dirty
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-10-25 7:58 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-19 17:28 [PATCH v4 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-19 17:28 ` [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-20 7:35 ` Jeff King
2023-10-20 16:55 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-19 17:28 ` [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-23 9:19 ` Patrick Steinhardt
2023-10-23 18:58 ` Jeff King
2023-10-24 6:34 ` Patrick Steinhardt
2023-10-24 17:08 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:29 ` [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-19 21:47 ` [PATCH v4 0/7] merge-ort: implement support for packing objects together Junio C Hamano
2023-10-20 7:29 ` Jeff King
2023-10-20 16:53 ` Junio C Hamano
2023-10-23 9:19 ` Patrick Steinhardt
2023-10-23 22:44 ` [PATCH v5 0/5] " Taylor Blau
2023-10-23 22:44 ` [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-25 7:37 ` Jeff King
2023-10-25 15:39 ` Taylor Blau
2023-10-27 23:12 ` Junio C Hamano
2023-10-23 22:44 ` [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-23 22:45 ` [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt [this message]
2023-10-25 15:44 ` Taylor Blau
2023-10-25 17:21 ` Eric Sunshine
2023-10-26 8:16 ` Patrick Steinhardt
2023-11-11 0:17 ` Elijah Newren
2023-10-23 22:45 ` [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-23 22:45 ` [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt
2023-10-25 15:46 ` Taylor Blau
2023-11-10 23:51 ` Elijah Newren
2023-11-11 0:27 ` Junio C Hamano
2023-11-11 1:34 ` Taylor Blau
2023-11-11 1:24 ` Taylor Blau
2023-11-13 22:05 ` Jeff King
2023-11-14 1:40 ` Junio C Hamano
2023-11-14 2:54 ` Elijah Newren
2023-11-14 21:55 ` Jeff King
2023-11-14 3:08 ` Elijah Newren
2023-11-13 22:02 ` Jeff King
2023-11-13 22:34 ` Taylor Blau
2023-11-14 2:50 ` Elijah Newren
2023-11-14 21:53 ` Jeff King
2023-11-14 22:04 ` Jeff King
2023-10-23 23:31 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Junio C Hamano
2023-11-06 15:46 ` Johannes Schindelin
2023-11-06 23:19 ` Junio C Hamano
2023-11-07 3:42 ` Jeff King
2023-11-07 15:58 ` Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 1/3] merge-ort.c: finalize ODB transactions after each step Taylor Blau
2023-11-11 3:45 ` Elijah Newren
2023-11-07 18:22 ` [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()` Taylor Blau
2023-11-08 7:05 ` Patrick Steinhardt
2023-11-09 19:26 ` Taylor Blau
2023-11-07 18:23 ` [RFC PATCH 3/3] builtin/replay.c: introduce `--write-pack` Taylor Blau
2023-11-11 3:42 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Elijah Newren
2023-11-11 4:04 ` Elijah Newren
2023-10-25 7:58 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZTjKjkRMkmCuxDU1@tanuki \
--to=ps@pks.im \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).