Git development
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: git@vger.kernel.org
Cc: ps@pks.im, gitster@pobox.com, Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v3 0/7] odb: add write operation to ODB transaction interface
Date: Thu,  2 Apr 2026 16:32:13 -0500	[thread overview]
Message-ID: <20260402213220.2651523-1-jltobler@gmail.com> (raw)
In-Reply-To: <20260401030316.1847362-1-jltobler@gmail.com>

Greetings,

This series lays the groundwork for introducing write operations to the
ODB transaction interface. The eventual goal is for all object writes
performed within a transaction to go through this interface explicitly,
rather than implicitly relying on the transaction to reconfigure ODB
sources so that writes are redirected to a temporary location.

For now, only `odb_transaction_write_object_stream()` is implemented and
wires up the existing logic for streaming "large" blobs directly into a
packfile as part of the transaction.

Most of the patches are structural refactorings to enable this, but
patch 4 introduces a behavioral change in how packfiles that would
exceed "pack.packSizeLimit" are handled.

Changes since V2:
- Renamed some variables to improve clarity
- Make `odb_write_stream_from_fd()` fully initialize the underlying
  `struct odb_write_stream`
- Move `struct odb_write_stream` to "odb/streaming.h"
- Make the `hash_blob_stream()` helper more generic by operating on a
  `struct odb_write_stream` instead of reading from an fd directly.
- Introduce an `odb_write_stream_release()` helper to free the
  underlying stream data.

Changes since V1:
- Fixed some typos
- Improved error handling
- Removed unnecessary guard statement
- Documented in comments why inflated object size is used to approximate
  if object write will exceed "pack.packSizeLimit".
- Updated `struct odb_write_stream` read() callback to support returning
  errors and using caller provided buffer
- Updated the `hash_blob_stream()` function signature to operate on a
  `struct odb_write_stream` instead of an fd directly
- Renamed some variables/functions for better clarity

Thanks,
-Justin

Justin Tobler (7):
  odb: split `struct odb_transaction` into separate header
  odb/transaction: use pluggable `begin_transaction()`
  odb: update `struct odb_write_stream` read() callback
  object-file: remove flags from transaction packfile writes
  object-file: avoid fd seekback by checking object size upfront
  object-file: generalize packfile writes to use odb_write_stream
  odb/transaction: make `write_object_stream()` pluggable

 Makefile                 |   1 +
 builtin/add.c            |   1 +
 builtin/unpack-objects.c |  21 ++--
 builtin/update-index.c   |   1 +
 cache-tree.c             |   1 +
 meson.build              |   1 +
 object-file.c            | 237 ++++++++++++++++++++-------------------
 odb.c                    |  25 -----
 odb.h                    |  37 +-----
 odb/streaming.c          |  51 +++++++++
 odb/streaming.h          |  30 +++++
 odb/transaction.c        |  35 ++++++
 odb/transaction.h        |  57 ++++++++++
 read-cache.c             |   1 +
 14 files changed, 311 insertions(+), 188 deletions(-)
 create mode 100644 odb/transaction.c
 create mode 100644 odb/transaction.h

Range-diff against v2:
1:  eee372b426 = 1:  eee372b426 odb: split `struct odb_transaction` into separate header
2:  57ac075560 = 2:  57ac075560 odb/transaction: use pluggable `begin_transaction()`
3:  556f003d0a ! 3:  11321ad607 odb: update `struct odb_write_stream` read() callback
    @@ Commit message
     
         Update the interface to instead require the caller to provide a buffer,
         and have the callback return the number of bytes written to it or a
    -    negative value on error. Call sites are updated accordingly.
    +    negative value on error. While at it, also move the `struct
    +    odb_write_stream` definition to "odb/streaming.h". Call sites are
    +    updated accordingly.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## builtin/unpack-objects.c ##
    +@@
    + #include "hex.h"
    + #include "object-file.h"
    + #include "odb.h"
    ++#include "odb/streaming.h"
    + #include "odb/transaction.h"
    + #include "object.h"
    + #include "delta.h"
     @@ builtin/unpack-objects.c: static void unpack_non_delta_entry(enum object_type type, unsigned long size,
      
      struct input_zstream_data {
    @@ object-file.c: int odb_source_loose_write_stream(struct odb_source *source,
     -			const void *in = in_stream->read(in_stream, &stream.avail_in);
     -			stream.next_in = (void *)in;
     -			in0 = (unsigned char *)in;
    -+			ssize_t read_len = in_stream->read(in_stream, buf, sizeof(buf));
    ++			ssize_t read_len = odb_write_stream_read(in_stream, buf,
    ++								 sizeof(buf));
     +			if (read_len < 0) {
     +				err = -1;
     +				goto cleanup;
    @@ object-file.c: int odb_source_loose_write_stream(struct odb_source *source,
     
      ## odb.h ##
     @@ odb.h: static inline int odb_write_object(struct object_database *odb,
    + 	return odb_write_object_ext(odb, buf, len, type, oid, NULL, 0);
      }
      
    - struct odb_write_stream {
    +-struct odb_write_stream {
     -	const void *(*read)(struct odb_write_stream *, unsigned long *len);
    -+	ssize_t (*read)(struct odb_write_stream *, unsigned char *, size_t len);
    - 	void *data;
    - 	int is_finished;
    - };
    +-	void *data;
    +-	int is_finished;
    +-};
    ++struct odb_write_stream;
    + 
    + int odb_write_object_stream(struct object_database *odb,
    + 			    struct odb_write_stream *stream, size_t len,
    +
    + ## odb/streaming.c ##
    +@@ odb/streaming.c: struct odb_read_stream *odb_read_stream_open(struct object_database *odb,
    + 	return st;
    + }
    + 
    ++ssize_t odb_write_stream_read(struct odb_write_stream *st, void *buf, size_t sz)
    ++{
    ++	return st->read(st, buf, sz);
    ++}
    ++
    + int odb_stream_blob_to_fd(struct object_database *odb,
    + 			  int fd,
    + 			  const struct object_id *oid,
    +
    + ## odb/streaming.h ##
    +@@ odb/streaming.h: int odb_read_stream_close(struct odb_read_stream *stream);
    +  */
    + ssize_t odb_read_stream_read(struct odb_read_stream *stream, void *buf, size_t len);
    + 
    ++/*
    ++ * A stream that provides an object to be written to the object database without
    ++ * loading all of it into memory.
    ++ */
    ++struct odb_write_stream {
    ++	ssize_t (*read)(struct odb_write_stream *, unsigned char *, size_t);
    ++	void *data;
    ++	int is_finished;
    ++};
    ++
    ++/*
    ++ * Read data from the stream into the buffer. Returns 0 when finished and the
    ++ * number of bytes read on success. Returns a negative error code in case
    ++ * reading from the stream fails.
    ++ */
    ++ssize_t odb_write_stream_read(struct odb_write_stream *stream, void *buf,
    ++			      size_t len);
    ++
    + /*
    +  * Look up the object by its ID and write the full contents to the file
    +  * descriptor. The object must be a blob, or the function will fail. When
4:  a9f0e5ad8a ! 4:  72d4656eee object-file: remove flags from transaction packfile writes
    @@ object-file.c: static void prepare_packfile_transaction(struct odb_transaction_f
     +	unsigned char buf[16384];
     +	struct git_hash_ctx ctx;
     +	unsigned header_len;
    -+	size_t total = 0;
    ++	size_t bytes_hashed = 0;
     +
     +	header_len = format_object_header((char *)buf, sizeof(buf),
     +					  OBJ_BLOB, size);
    @@ object-file.c: static void prepare_packfile_transaction(struct odb_transaction_f
     +	git_hash_update(&ctx, buf, header_len);
     +
     +	while (!stream->is_finished) {
    -+		ssize_t read_result = stream->read(stream, buf, sizeof(buf));
    ++		ssize_t read_result = odb_write_stream_read(stream, buf,
    ++							    sizeof(buf));
     +
     +		if (read_result < 0)
     +			return -1;
     +
     +		git_hash_update(&ctx, buf, read_result);
    -+		total += read_result;
    ++		bytes_hashed += read_result;
     +	}
     +
    -+	if (total != size)
    ++	if (bytes_hashed != size)
     +		return -1;
     +
     +	git_hash_final_oid(result_oid, &ctx);
    @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
     -						      xsize_t(st->st_size),
     -						      path, flags);
     -		odb_transaction_commit(transaction);
    -+		struct odb_write_stream stream = { 0 };
    ++		struct odb_write_stream stream;
     +		odb_write_stream_from_fd(&stream, fd, xsize_t(st->st_size));
     +
     +		if (flags & INDEX_WRITE_OBJECT) {
    @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
     +					       xsize_t(st->st_size));
     +		}
     +
    -+		free(stream.data);
    ++		odb_write_stream_release(&stream);
      	}
      
      	close(fd);
     
      ## odb/streaming.c ##
    +@@ odb/streaming.c: ssize_t odb_write_stream_read(struct odb_write_stream *st, void *buf, size_t sz)
    + 	return st->read(st, buf, sz);
    + }
    + 
    ++void odb_write_stream_release(struct odb_write_stream *st)
    ++{
    ++	free(st->data);
    ++}
    ++
    + int odb_stream_blob_to_fd(struct object_database *odb,
    + 			  int fd,
    + 			  const struct object_id *oid,
     @@ odb/streaming.c: int odb_stream_blob_to_fd(struct object_database *odb,
      	odb_read_stream_close(st);
      	return result;
    @@ odb/streaming.c: int odb_stream_blob_to_fd(struct object_database *odb,
     +
     +	stream->data = data;
     +	stream->read = read_object_fd;
    ++	stream->is_finished = 0;
     +}
     
      ## odb/streaming.h ##
    @@ odb/streaming.h
      
      struct object_database;
      struct odb_read_stream;
    +@@ odb/streaming.h: struct odb_write_stream {
    + ssize_t odb_write_stream_read(struct odb_write_stream *stream, void *buf,
    + 			      size_t len);
    + 
    ++/*
    ++ * Releases memory allocated for underlying stream data.
    ++ */
    ++void odb_write_stream_release(struct odb_write_stream *stream);
    ++
    + /*
    +  * Look up the object by its ID and write the full contents to the file
    +  * descriptor. The object must be a blob, or the function will fail. When
     @@ odb/streaming.h: int odb_stream_blob_to_fd(struct object_database *odb,
      			  struct stream_filter *filter,
      			  int can_seek);
      
     +/*
    -+ * Sets up an ODB write stream that reads from an fd. The caller is expected to
    -+ * free the underlying stream data.
    ++ * Sets up an ODB write stream that reads from an fd.
     + */
     +void odb_write_stream_from_fd(struct odb_write_stream *stream, int fd,
     +			      size_t size);
5:  b7ac82ed7e = 5:  e4896101ff object-file: avoid fd seekback by checking object size upfront
6:  d6c4187a0f ! 6:  b3cb0a707c object-file: generalize packfile writes to use odb_write_stream
    @@ object-file.c: static int hash_blob_stream(struct odb_write_stream *stream,
      	unsigned char obuf[16384];
      	unsigned hdrlen;
      	int status = Z_OK;
    -+	size_t total = 0;
    ++	size_t bytes_read = 0;
      
      	git_deflate_init(&s, pack_compression_level);
      
    @@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
     -				die("failed to read %u bytes from '%s'",
     -				    (unsigned)rsize, path);
     +		if (!stream->is_finished && !s.avail_in) {
    -+			ssize_t rsize = stream->read(stream, ibuf, sizeof(ibuf));
    ++			ssize_t rsize = odb_write_stream_read(stream, ibuf,
    ++							      sizeof(ibuf));
     +
     +			if (rsize < 0)
     +				die("failed to read blob data");
    @@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
      			s.next_in = ibuf;
      			s.avail_in = rsize;
     -			size -= rsize;
    -+			total += rsize;
    ++			bytes_read += rsize;
      		}
      
     -		status = git_deflate(&s, size ? 0 : Z_FINISH);
    @@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
      		}
      	}
     +
    -+	if (total != size)
    ++	if (bytes_read != size)
     +		die("read %" PRIuMAX " bytes of blob data, but expected %" PRIuMAX " bytes",
    -+		    (uintmax_t)total, (uintmax_t)size);
    ++		    (uintmax_t)bytes_read, (uintmax_t)size);
     +
      	git_deflate_end(&s);
      }
7:  2b81e94677 ! 7:  e1d292a7ed odb/transaction: make `write_object_stream()` pluggable
    @@ Commit message
         How an ODB transaction handles writing objects is expected to vary
         between implementations. Introduce a new `write_object_stream()`
         callback in `struct odb_transaction` to make this function pluggable.
    -    Wire up `index_blob_packfile_transaction()` for use with `struct
    -    odb_transaction_files` accordingly.
    +    Rename `index_blob_packfile_transaction()` to
    +    `odb_transaction_files_write_object_stream()` and wire it up for use
    +    with `struct odb_transaction_files` accordingly.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     

base-commit: 5361983c075154725be47b65cca9a2421789e410
-- 
2.53.0.381.g628a66ccf6


  parent reply	other threads:[~2026-04-02 21:32 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31  3:38 [PATCH 0/6] odb: add write operation to ODB transaction interface Justin Tobler
2026-03-31  3:38 ` [PATCH 1/6] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 13:56     ` Justin Tobler
2026-03-31 15:58       ` Junio C Hamano
2026-03-31 16:44         ` Justin Tobler
2026-03-31  3:38 ` [PATCH 2/6] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 3/6] object-file: remove flags from transaction packfile writes Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:10     ` Justin Tobler
2026-03-31  3:38 ` [PATCH 4/6] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:14     ` Justin Tobler
2026-03-31  3:38 ` [PATCH 5/6] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:31     ` Justin Tobler
2026-03-31 22:59       ` Patrick Steinhardt
2026-03-31 23:21         ` Justin Tobler
2026-03-31 23:40           ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 6/6] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:40     ` Justin Tobler
2026-04-01  3:03 ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Justin Tobler
2026-04-01  3:03   ` [PATCH v2 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-01  3:03   ` [PATCH v2 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01  3:03   ` [PATCH v2 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01 14:02       ` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-01  3:03   ` [PATCH v2 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-01  3:03   ` [PATCH v2 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-01 11:24   ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-04-02 21:32   ` Justin Tobler [this message]
2026-04-02 21:32     ` [PATCH v3 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-02 21:32     ` [PATCH v3 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-11 17:58       ` Jeff King
2026-05-12 15:19         ` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-06 20:16       ` Jeff King
2026-04-06 20:19         ` Jeff King
2026-04-02 21:32     ` [PATCH v3 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-02 21:32     ` [PATCH v3 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-02 21:32     ` [PATCH v3 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-08  7:25     ` [PATCH v3 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-05-14 18:37     ` [PATCH v4 " Justin Tobler
2026-05-14 18:37       ` [PATCH v4 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-05-14 18:37       ` [PATCH v4 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-05-14 18:37       ` [PATCH v4 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-14 18:37       ` [PATCH v4 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-05-14 18:37       ` [PATCH v4 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-05-14 18:37       ` [PATCH v4 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-05-14 18:37       ` [PATCH v4 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-05-15  3:56       ` [PATCH v4 0/7] odb: add write operation to ODB transaction interface Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260402213220.2651523-1-jltobler@gmail.com \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox