From: Justin Tobler <jltobler@gmail.com>
To: git@vger.kernel.org
Cc: ps@pks.im, gitster@pobox.com, Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v2 0/7] odb: add write operation to ODB transaction interface
Date: Tue, 31 Mar 2026 22:03:08 -0500 [thread overview]
Message-ID: <20260401030316.1847362-1-jltobler@gmail.com> (raw)
In-Reply-To: <20260331033835.2863514-1-jltobler@gmail.com>
Greetings,
This series lays the groundwork for introducing write operations to the
ODB transaction interface. The eventual goal is for all object writes
performed within a transaction to go through this interface explicitly,
rather than implicitly relying on the transaction to reconfigure ODB
sources so that writes are redirected to a temporary location.
For now, only `odb_transaction_write_object_stream()` is implemented and
wires up the existing logic for streaming "large" blobs directly into a
packfile as part of the transaction.
Most of the patches are structural refactorings to enable this, but
patch 4 introduces a behavioral change in how packfiles that would
exceed "pack.packSizeLimit" are handled.
Changes since V1:
- Fixed some typos
- Improved error handling
- Removed unnecessary guard statement
- Documented in comments why inflated object size is used to approximate
if object write will exceed "pack.packSizeLimit".
- Updated `struct odb_write_stream` read() callback to support returning
errors and using caller provided buffer
- Updated the `hash_blob_stream()` function signature to operate on a
`struct odb_write_stream` instead of an fd directly
- Renamed some variables/functions for better clarity
Thanks,
-Justin
Justin Tobler (7):
odb: split `struct odb_transaction` into separate header
odb/transaction: use pluggable `begin_transaction()`
odb: update `struct odb_write_stream` read() callback
object-file: remove flags from transaction packfile writes
object-file: avoid fd seekback by checking object size upfront
object-file: generalize packfile writes to use odb_write_stream
odb/transaction: make `write_object_stream()` pluggable
Makefile | 1 +
builtin/add.c | 1 +
builtin/unpack-objects.c | 20 ++--
builtin/update-index.c | 1 +
cache-tree.c | 1 +
meson.build | 1 +
object-file.c | 234 ++++++++++++++++++++-------------------
odb.c | 25 -----
odb.h | 33 +-----
odb/streaming.c | 40 +++++++
odb/streaming.h | 8 ++
odb/transaction.c | 35 ++++++
odb/transaction.h | 57 ++++++++++
read-cache.c | 1 +
14 files changed, 274 insertions(+), 184 deletions(-)
create mode 100644 odb/transaction.c
create mode 100644 odb/transaction.h
Range-diff against v1:
1: 65cb557a5a ! 1: eee372b426 odb: split `struct odb_transaction` into separate header
@@ Metadata
## Commit message ##
odb: split `struct odb_transaction` into separate header
- The current ODB transaction interface is collocated with other ODB
+ The current ODB transaction interface is colocated with other ODB
interfaces in "odb.{c,h}". Subsequent commits will expand `struct
odb_transaction` to support write operations on the transaction
directly. To keep things organized and prevent "odb.{c,h}" from becoming
2: b009d08f3e = 2: 57ac075560 odb/transaction: use pluggable `begin_transaction()`
-: ---------- > 3: 556f003d0a odb: update `struct odb_write_stream` read() callback
3: ceb44a7d25 ! 4: a9f0e5ad8a object-file: remove flags from transaction packfile writes
@@ Commit message
conditional on this flag is a bit awkward.
In preparation for this change, introduce a dedicated
- `hash_blob_stream()` helper that only computes the OID from the fd. This
- is invoked by `index_fd()` instead when the INDEX_WRITE_OBJECT is not
- set. The object write performed via `index_blob_packfile_transaction()`
- is made unconditional accordingly.
+ `hash_blob_stream()` helper that only computes the OID from a `struct
+ odb_write_stream`. This is invoked by `index_fd()` instead when the
+ INDEX_WRITE_OBJECT is not set. The object write performed via
+ `index_blob_packfile_transaction()` is made unconditional accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
@@ object-file.c: static void prepare_packfile_transaction(struct odb_transaction_f
die_errno("unable to write pack header");
}
-+static int hash_blob_stream(const struct git_hash_algo *hash_algo,
-+ struct object_id *result_oid, int fd, size_t size)
++static int hash_blob_stream(struct odb_write_stream *stream,
++ const struct git_hash_algo *hash_algo,
++ struct object_id *result_oid, size_t size)
+{
+ unsigned char buf[16384];
+ struct git_hash_ctx ctx;
+ unsigned header_len;
++ size_t total = 0;
+
+ header_len = format_object_header((char *)buf, sizeof(buf),
+ OBJ_BLOB, size);
+ hash_algo->init_fn(&ctx);
+ git_hash_update(&ctx, buf, header_len);
+
-+ while (size) {
-+ size_t rsize = size < sizeof(buf) ? size : sizeof(buf);
-+ ssize_t read_result = read_in_full(fd, buf, rsize);
++ while (!stream->is_finished) {
++ ssize_t read_result = stream->read(stream, buf, sizeof(buf));
+
-+ if ((size_t)read_result != rsize)
++ if (read_result < 0)
+ return -1;
+
-+ git_hash_update(&ctx, buf, rsize);
-+ size -= read_result;
++ git_hash_update(&ctx, buf, read_result);
++ total += read_result;
+ }
+
++ if (total != size)
++ return -1;
++
+ git_hash_final_oid(result_oid, &ctx);
+
+ return 0;
@@ object-file.c: static int index_blob_packfile_transaction(struct odb_transaction
idx->crc32 = crc32_end(state->f);
if (already_written(transaction, result_oid)) {
-@@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
- int fd, struct stat *st,
- enum object_type type, const char *path, unsigned flags)
- {
-- int ret;
-+ int ret = 0;
-
- /*
- * Call xsize_t() only when needed to avoid potentially unnecessary
@@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
ret = index_core(istate, oid, fd, xsize_t(st->st_size),
type, path, flags);
@@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
- xsize_t(st->st_size),
- path, flags);
- odb_transaction_commit(transaction);
++ struct odb_write_stream stream = { 0 };
++ odb_write_stream_from_fd(&stream, fd, xsize_t(st->st_size));
++
+ if (flags & INDEX_WRITE_OBJECT) {
+ struct object_database *odb = the_repository->objects;
+ struct odb_transaction_files *files_transaction;
@@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
+ xsize_t(st->st_size), path);
+ odb_transaction_commit(transaction);
+ } else {
-+ if (hash_blob_stream(the_repository->hash_algo, oid, fd,
-+ xsize_t(st->st_size)))
-+ die("failed to hash blob");
++ ret = hash_blob_stream(&stream,
++ the_repository->hash_algo, oid,
++ xsize_t(st->st_size));
+ }
++
++ free(stream.data);
}
close(fd);
+
+ ## odb/streaming.c ##
+@@ odb/streaming.c: int odb_stream_blob_to_fd(struct object_database *odb,
+ odb_read_stream_close(st);
+ return result;
+ }
++
++struct read_object_fd_data {
++ int fd;
++ size_t remaining;
++};
++
++static ssize_t read_object_fd(struct odb_write_stream *stream,
++ unsigned char *buf, size_t len)
++{
++ struct read_object_fd_data *data = stream->data;
++ ssize_t read_result;
++ size_t count;
++
++ if (stream->is_finished)
++ return 0;
++
++ count = data->remaining < len ? data->remaining : len;
++ read_result = read_in_full(data->fd, buf, count);
++ if (read_result < 0 || (size_t)read_result != count)
++ return -1;
++
++ data->remaining -= count;
++ if (!data->remaining)
++ stream->is_finished = 1;
++
++ return read_result;
++}
++
++void odb_write_stream_from_fd(struct odb_write_stream *stream, int fd,
++ size_t size)
++{
++ struct read_object_fd_data *data;
++
++ CALLOC_ARRAY(data, 1);
++ data->fd = fd;
++ data->remaining = size;
++
++ stream->data = data;
++ stream->read = read_object_fd;
++}
+
+ ## odb/streaming.h ##
+@@
+ #define STREAMING_H 1
+
+ #include "object.h"
++#include "odb.h"
+
+ struct object_database;
+ struct odb_read_stream;
+@@ odb/streaming.h: int odb_stream_blob_to_fd(struct object_database *odb,
+ struct stream_filter *filter,
+ int can_seek);
+
++/*
++ * Sets up an ODB write stream that reads from an fd. The caller is expected to
++ * free the underlying stream data.
++ */
++void odb_write_stream_from_fd(struct odb_write_stream *stream, int fd,
++ size_t size);
++
+ #endif /* STREAMING_H */
4: 8849b805e9 ! 5: b7ac82ed7e object-file: avoid fd seekback by checking object size upfront
@@ Commit message
object-file: avoid fd seekback by checking object size upfront
In certain scenarios, Git handles writing blobs that exceed
- "core.bigFilesThreshold" differently by streaming the object directly
+ "core.bigFileThreshold" differently by streaming the object directly
into a packfile. When there is an active ODB transaction, these blobs
are streamed to the same packfile instead of using a separate packfile
for each. If "pack.packSizeLimit" is configured and streaming another
@@ Commit message
because the inflated size of the object is known and can be used to
approximate whether writing the object would cause the packfile to
exceed the configured limit prior to writing anything. These blobs
- written to the packfile are never deltafied thus the size difference
+ written to the packfile are never deltified thus the size difference
between what is written versus the inflated size is due to zlib
compression. While this does prevent packfiles from being filled to the
potential maximum is some cases, it should be good enough and still
@@ Commit message
Signed-off-by: Justin Tobler <jltobler@gmail.com>
## object-file.c ##
-@@ object-file.c: static int hash_blob_stream(const struct git_hash_algo *hash_algo,
+@@ object-file.c: static int hash_blob_stream(struct odb_write_stream *stream,
/*
* Read the contents from fd for size bytes, streaming it to the
@@ object-file.c: static int stream_blob_to_pack(struct transaction_packfile *state
- *already_hashed_to = offset;
- }
+
-+ if (rsize)
-+ git_hash_update(ctx, ibuf, rsize);
++ git_hash_update(ctx, ibuf, rsize);
+
s.next_in = ibuf;
s.avail_in = rsize;
@@ object-file.c: static int index_blob_packfile_transaction(struct odb_transaction
+ * If writing another object to the packfile could result in it
+ * exceeding the configured size limit, flush the current packfile
+ * transaction.
++ *
++ * Note that this uses the inflated object size as an approximation.
++ * Blob objects written in this manner are not delta-compressed, so
++ * the difference between the inflated and on-disk size is limited
++ * to zlib compression and is sufficient for this check.
+ */
+ if (state->nr_written && pack_size_limit_cfg &&
+ pack_size_limit_cfg < state->offset + size)
5: a25e5a2451 ! 6: d6c4187a0f object-file: generalize packfile writes to use odb_write_stream
@@ Commit message
Signed-off-by: Justin Tobler <jltobler@gmail.com>
## object-file.c ##
-@@ object-file.c: static int hash_blob_stream(const struct git_hash_algo *hash_algo,
+@@ object-file.c: static int hash_blob_stream(struct odb_write_stream *stream,
}
/*
@@ object-file.c: static int hash_blob_stream(const struct git_hash_algo *hash_algo
+ struct odb_write_stream *stream)
{
git_zstream s;
-- unsigned char ibuf[16384];
+ unsigned char ibuf[16384];
unsigned char obuf[16384];
unsigned hdrlen;
int status = Z_OK;
@@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
- die("failed to read %u bytes from '%s'",
- (unsigned)rsize, path);
+ if (!stream->is_finished && !s.avail_in) {
-+ unsigned long rsize;
-+ unsigned const char *buf = stream->read(stream, &rsize);
++ ssize_t rsize = stream->read(stream, ibuf, sizeof(ibuf));
++
++ if (rsize < 0)
++ die("failed to read blob data");
- if (rsize)
-- git_hash_update(ctx, ibuf, rsize);
-+ git_hash_update(ctx, buf, rsize);
+ git_hash_update(ctx, ibuf, rsize);
-- s.next_in = ibuf;
-+ s.next_in = (unsigned char *)buf;
+ s.next_in = ibuf;
s.avail_in = rsize;
- size -= rsize;
+ total += rsize;
@@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
}
+
+ if (total != size)
-+ die("unexpected number of bytes read");
++ die("read %" PRIuMAX " bytes of blob data, but expected %" PRIuMAX " bytes",
++ (uintmax_t)total, (uintmax_t)size);
+
git_deflate_end(&s);
}
-@@ object-file.c: static void flush_packfile_transaction(struct odb_transaction_files *transaction
- odb_reprepare(repo->objects);
- }
-
-+struct read_object_fd_data {
-+ int fd;
-+ size_t size;
-+ unsigned char buf[16384];
-+};
-+
-+static const void *read_object_fd(struct odb_write_stream *stream,
-+ unsigned long *len)
-+{
-+ struct read_object_fd_data *data = stream->data;
-+ ssize_t read_result;
-+ size_t rsize;
-+
-+ if (stream->is_finished) {
-+ *len = 0;
-+ return NULL;
-+ }
-+
-+ rsize = data->size < sizeof(data->buf) ? data->size : sizeof(data->buf);
-+ read_result = read_in_full(data->fd, data->buf, rsize);
-+ if (read_result < 0)
-+ die_errno("failed to read blob data");
-+ if ((size_t)read_result != rsize)
-+ die("failed to read %u bytes of blob data", (unsigned)rsize);
-+
-+ data->size -= rsize;
-+ if (!data->size)
-+ stream->is_finished = 1;
-+
-+ *len = rsize;
-+
-+ return data->buf;
-+}
-+
- /*
- * This writes the specified object to a packfile. Objects written here
- * during the same transaction are written to the same packfile. The
@@ object-file.c: static void flush_packfile_transaction(struct odb_transaction_files *transaction
* binary blobs, they generally do not want to get any conversion, and
* callers should avoid this code path when filters are requested.
@@ object-file.c: static int index_blob_packfile_transaction(struct odb_transaction
idx->crc32 = crc32_end(state->f);
@@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
- } else {
+
if (flags & INDEX_WRITE_OBJECT) {
struct object_database *odb = the_repository->objects;
- struct odb_transaction_files *files_transaction;
@@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
- ret = index_blob_packfile_transaction(files_transaction, oid, fd,
- xsize_t(st->st_size), path);
+ struct odb_transaction *transaction = odb_transaction_begin(odb);
-+ struct read_object_fd_data data = {
-+ .fd = fd,
-+ .size = xsize_t(st->st_size),
-+ };
-+ struct odb_write_stream in_stream = {
-+ .read = read_object_fd,
-+ .data = &data,
-+ };
+
+ ret = index_blob_packfile_transaction(odb->transaction,
-+ &in_stream,
++ &stream,
+ xsize_t(st->st_size),
+ oid);
odb_transaction_commit(transaction);
} else {
- if (hash_blob_stream(the_repository->hash_algo, oid, fd,
+ ret = hash_blob_stream(&stream,
6: bbc62ec512 ! 7: 2b81e94677 odb/transaction: make `write_object_stream()` pluggable
@@ Commit message
Signed-off-by: Justin Tobler <jltobler@gmail.com>
## object-file.c ##
+@@ object-file.c: static void flush_packfile_transaction(struct odb_transaction_files *transaction
+ * binary blobs, they generally do not want to get any conversion, and
+ * callers should avoid this code path when filters are requested.
+ */
+-static int index_blob_packfile_transaction(struct odb_transaction *base,
+- struct odb_write_stream *stream,
+- size_t size, struct object_id *result_oid)
++static int odb_transaction_files_write_object_stream(struct odb_transaction *base,
++ struct odb_write_stream *stream,
++ size_t size,
++ struct object_id *result_oid)
+ {
+ struct odb_transaction_files *transaction = container_of(base,
+ struct odb_transaction_files,
@@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
- .data = &data,
- };
+ struct object_database *odb = the_repository->objects;
+ struct odb_transaction *transaction = odb_transaction_begin(odb);
- ret = index_blob_packfile_transaction(odb->transaction,
-- &in_stream,
+- &stream,
- xsize_t(st->st_size),
- oid);
+ ret = odb_transaction_write_object_stream(odb->transaction,
-+ &in_stream,
++ &stream,
+ xsize_t(st->st_size),
+ oid);
odb_transaction_commit(transaction);
} else {
- if (hash_blob_stream(the_repository->hash_algo, oid, fd,
+ ret = hash_blob_stream(&stream,
@@ object-file.c: struct odb_transaction *odb_transaction_files_begin(struct odb_source *source)
transaction = xcalloc(1, sizeof(*transaction));
transaction->base.source = source;
transaction->base.commit = odb_transaction_files_commit;
-+ transaction->base.write_object_stream = index_blob_packfile_transaction;
++ transaction->base.write_object_stream = odb_transaction_files_write_object_stream;
return &transaction->base;
}
@@ odb/transaction.h
+
+ /*
+ * This callback is expected to write the given object stream into
-+ * the ODB transaction.
++ * the ODB transaction. Note that for now, only blobs support streaming.
+ *
+ * The resulting object ID shall be written into the out pointer. The
+ * callback is expected to return 0 on success, a negative error code
base-commit: 5361983c075154725be47b65cca9a2421789e410
--
2.53.0.381.g628a66ccf6
next prev parent reply other threads:[~2026-04-01 3:03 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-31 3:38 [PATCH 0/6] odb: add write operation to ODB transaction interface Justin Tobler
2026-03-31 3:38 ` [PATCH 1/6] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-03-31 7:48 ` Patrick Steinhardt
2026-03-31 13:56 ` Justin Tobler
2026-03-31 15:58 ` Junio C Hamano
2026-03-31 16:44 ` Justin Tobler
2026-03-31 3:38 ` [PATCH 2/6] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-03-31 7:48 ` Patrick Steinhardt
2026-03-31 3:38 ` [PATCH 3/6] object-file: remove flags from transaction packfile writes Justin Tobler
2026-03-31 7:48 ` Patrick Steinhardt
2026-03-31 14:10 ` Justin Tobler
2026-03-31 3:38 ` [PATCH 4/6] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-03-31 7:48 ` Patrick Steinhardt
2026-03-31 14:14 ` Justin Tobler
2026-03-31 3:38 ` [PATCH 5/6] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-03-31 7:48 ` Patrick Steinhardt
2026-03-31 14:31 ` Justin Tobler
2026-03-31 22:59 ` Patrick Steinhardt
2026-03-31 23:21 ` Justin Tobler
2026-03-31 23:40 ` Patrick Steinhardt
2026-03-31 3:38 ` [PATCH 6/6] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-03-31 7:48 ` Patrick Steinhardt
2026-03-31 14:40 ` Justin Tobler
2026-04-01 3:03 ` Justin Tobler [this message]
2026-04-01 3:03 ` [PATCH v2 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-01 3:03 ` [PATCH v2 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-01 3:03 ` [PATCH v2 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-04-01 11:23 ` Patrick Steinhardt
2026-04-01 3:03 ` [PATCH v2 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-01 11:23 ` Patrick Steinhardt
2026-04-01 14:02 ` Justin Tobler
2026-04-01 3:03 ` [PATCH v2 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-01 3:03 ` [PATCH v2 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-01 3:03 ` [PATCH v2 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-01 11:24 ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-04-02 21:32 ` [PATCH v3 " Justin Tobler
2026-04-02 21:32 ` [PATCH v3 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-02 21:32 ` [PATCH v3 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-02 21:32 ` [PATCH v3 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-11 17:58 ` Jeff King
2026-05-12 15:19 ` Justin Tobler
2026-04-02 21:32 ` [PATCH v3 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-06 20:16 ` Jeff King
2026-04-06 20:19 ` Jeff King
2026-04-02 21:32 ` [PATCH v3 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-02 21:32 ` [PATCH v3 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-02 21:32 ` [PATCH v3 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-08 7:25 ` [PATCH v3 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-05-14 18:37 ` [PATCH v4 " Justin Tobler
2026-05-14 18:37 ` [PATCH v4 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-05-14 18:37 ` [PATCH v4 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-05-14 18:37 ` [PATCH v4 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-14 18:37 ` [PATCH v4 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-05-14 18:37 ` [PATCH v4 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-05-14 18:37 ` [PATCH v4 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-05-14 18:37 ` [PATCH v4 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-05-15 3:56 ` [PATCH v4 0/7] odb: add write operation to ODB transaction interface Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260401030316.1847362-1-jltobler@gmail.com \
--to=jltobler@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox