All of lore.kernel.org
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: git@vger.kernel.org
Cc: ps@pks.im, gitster@pobox.com, Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v3 0/7] odb: add write operation to ODB transaction interface
Date: Thu,  2 Apr 2026 16:32:13 -0500	[thread overview]
Message-ID: <20260402213220.2651523-1-jltobler@gmail.com> (raw)
In-Reply-To: <20260401030316.1847362-1-jltobler@gmail.com>

Greetings,

This series lays the groundwork for introducing write operations to the
ODB transaction interface. The eventual goal is for all object writes
performed within a transaction to go through this interface explicitly,
rather than implicitly relying on the transaction to reconfigure ODB
sources so that writes are redirected to a temporary location.

For now, only `odb_transaction_write_object_stream()` is implemented and
wires up the existing logic for streaming "large" blobs directly into a
packfile as part of the transaction.

Most of the patches are structural refactorings to enable this, but
patch 4 introduces a behavioral change in how packfiles that would
exceed "pack.packSizeLimit" are handled.

Changes since V2:
- Renamed some variables to improve clarity
- Make `odb_write_stream_from_fd()` fully initialize the underlying
  `struct odb_write_stream`
- Move `struct odb_write_stream` to "odb/streaming.h"
- Make the `hash_blob_stream()` helper more generic by operating on a
  `struct odb_write_stream` instead of reading from an fd directly.
- Introduce an `odb_write_stream_release()` helper to free the
  underlying stream data.

Changes since V1:
- Fixed some typos
- Improved error handling
- Removed unnecessary guard statement
- Documented in comments why inflated object size is used to approximate
  if object write will exceed "pack.packSizeLimit".
- Updated `struct odb_write_stream` read() callback to support returning
  errors and using caller provided buffer
- Updated the `hash_blob_stream()` function signature to operate on a
  `struct odb_write_stream` instead of an fd directly
- Renamed some variables/functions for better clarity

Thanks,
-Justin

Justin Tobler (7):
  odb: split `struct odb_transaction` into separate header
  odb/transaction: use pluggable `begin_transaction()`
  odb: update `struct odb_write_stream` read() callback
  object-file: remove flags from transaction packfile writes
  object-file: avoid fd seekback by checking object size upfront
  object-file: generalize packfile writes to use odb_write_stream
  odb/transaction: make `write_object_stream()` pluggable

 Makefile                 |   1 +
 builtin/add.c            |   1 +
 builtin/unpack-objects.c |  21 ++--
 builtin/update-index.c   |   1 +
 cache-tree.c             |   1 +
 meson.build              |   1 +
 object-file.c            | 237 ++++++++++++++++++++-------------------
 odb.c                    |  25 -----
 odb.h                    |  37 +-----
 odb/streaming.c          |  51 +++++++++
 odb/streaming.h          |  30 +++++
 odb/transaction.c        |  35 ++++++
 odb/transaction.h        |  57 ++++++++++
 read-cache.c             |   1 +
 14 files changed, 311 insertions(+), 188 deletions(-)
 create mode 100644 odb/transaction.c
 create mode 100644 odb/transaction.h

Range-diff against v2:
1:  eee372b426 = 1:  eee372b426 odb: split `struct odb_transaction` into separate header
2:  57ac075560 = 2:  57ac075560 odb/transaction: use pluggable `begin_transaction()`
3:  556f003d0a ! 3:  11321ad607 odb: update `struct odb_write_stream` read() callback
    @@ Commit message
     
         Update the interface to instead require the caller to provide a buffer,
         and have the callback return the number of bytes written to it or a
    -    negative value on error. Call sites are updated accordingly.
    +    negative value on error. While at it, also move the `struct
    +    odb_write_stream` definition to "odb/streaming.h". Call sites are
    +    updated accordingly.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## builtin/unpack-objects.c ##
    +@@
    + #include "hex.h"
    + #include "object-file.h"
    + #include "odb.h"
    ++#include "odb/streaming.h"
    + #include "odb/transaction.h"
    + #include "object.h"
    + #include "delta.h"
     @@ builtin/unpack-objects.c: static void unpack_non_delta_entry(enum object_type type, unsigned long size,
      
      struct input_zstream_data {
    @@ object-file.c: int odb_source_loose_write_stream(struct odb_source *source,
     -			const void *in = in_stream->read(in_stream, &stream.avail_in);
     -			stream.next_in = (void *)in;
     -			in0 = (unsigned char *)in;
    -+			ssize_t read_len = in_stream->read(in_stream, buf, sizeof(buf));
    ++			ssize_t read_len = odb_write_stream_read(in_stream, buf,
    ++								 sizeof(buf));
     +			if (read_len < 0) {
     +				err = -1;
     +				goto cleanup;
    @@ object-file.c: int odb_source_loose_write_stream(struct odb_source *source,
     
      ## odb.h ##
     @@ odb.h: static inline int odb_write_object(struct object_database *odb,
    + 	return odb_write_object_ext(odb, buf, len, type, oid, NULL, 0);
      }
      
    - struct odb_write_stream {
    +-struct odb_write_stream {
     -	const void *(*read)(struct odb_write_stream *, unsigned long *len);
    -+	ssize_t (*read)(struct odb_write_stream *, unsigned char *, size_t len);
    - 	void *data;
    - 	int is_finished;
    - };
    +-	void *data;
    +-	int is_finished;
    +-};
    ++struct odb_write_stream;
    + 
    + int odb_write_object_stream(struct object_database *odb,
    + 			    struct odb_write_stream *stream, size_t len,
    +
    + ## odb/streaming.c ##
    +@@ odb/streaming.c: struct odb_read_stream *odb_read_stream_open(struct object_database *odb,
    + 	return st;
    + }
    + 
    ++ssize_t odb_write_stream_read(struct odb_write_stream *st, void *buf, size_t sz)
    ++{
    ++	return st->read(st, buf, sz);
    ++}
    ++
    + int odb_stream_blob_to_fd(struct object_database *odb,
    + 			  int fd,
    + 			  const struct object_id *oid,
    +
    + ## odb/streaming.h ##
    +@@ odb/streaming.h: int odb_read_stream_close(struct odb_read_stream *stream);
    +  */
    + ssize_t odb_read_stream_read(struct odb_read_stream *stream, void *buf, size_t len);
    + 
    ++/*
    ++ * A stream that provides an object to be written to the object database without
    ++ * loading all of it into memory.
    ++ */
    ++struct odb_write_stream {
    ++	ssize_t (*read)(struct odb_write_stream *, unsigned char *, size_t);
    ++	void *data;
    ++	int is_finished;
    ++};
    ++
    ++/*
    ++ * Read data from the stream into the buffer. Returns 0 when finished and the
    ++ * number of bytes read on success. Returns a negative error code in case
    ++ * reading from the stream fails.
    ++ */
    ++ssize_t odb_write_stream_read(struct odb_write_stream *stream, void *buf,
    ++			      size_t len);
    ++
    + /*
    +  * Look up the object by its ID and write the full contents to the file
    +  * descriptor. The object must be a blob, or the function will fail. When
4:  a9f0e5ad8a ! 4:  72d4656eee object-file: remove flags from transaction packfile writes
    @@ object-file.c: static void prepare_packfile_transaction(struct odb_transaction_f
     +	unsigned char buf[16384];
     +	struct git_hash_ctx ctx;
     +	unsigned header_len;
    -+	size_t total = 0;
    ++	size_t bytes_hashed = 0;
     +
     +	header_len = format_object_header((char *)buf, sizeof(buf),
     +					  OBJ_BLOB, size);
    @@ object-file.c: static void prepare_packfile_transaction(struct odb_transaction_f
     +	git_hash_update(&ctx, buf, header_len);
     +
     +	while (!stream->is_finished) {
    -+		ssize_t read_result = stream->read(stream, buf, sizeof(buf));
    ++		ssize_t read_result = odb_write_stream_read(stream, buf,
    ++							    sizeof(buf));
     +
     +		if (read_result < 0)
     +			return -1;
     +
     +		git_hash_update(&ctx, buf, read_result);
    -+		total += read_result;
    ++		bytes_hashed += read_result;
     +	}
     +
    -+	if (total != size)
    ++	if (bytes_hashed != size)
     +		return -1;
     +
     +	git_hash_final_oid(result_oid, &ctx);
    @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
     -						      xsize_t(st->st_size),
     -						      path, flags);
     -		odb_transaction_commit(transaction);
    -+		struct odb_write_stream stream = { 0 };
    ++		struct odb_write_stream stream;
     +		odb_write_stream_from_fd(&stream, fd, xsize_t(st->st_size));
     +
     +		if (flags & INDEX_WRITE_OBJECT) {
    @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
     +					       xsize_t(st->st_size));
     +		}
     +
    -+		free(stream.data);
    ++		odb_write_stream_release(&stream);
      	}
      
      	close(fd);
     
      ## odb/streaming.c ##
    +@@ odb/streaming.c: ssize_t odb_write_stream_read(struct odb_write_stream *st, void *buf, size_t sz)
    + 	return st->read(st, buf, sz);
    + }
    + 
    ++void odb_write_stream_release(struct odb_write_stream *st)
    ++{
    ++	free(st->data);
    ++}
    ++
    + int odb_stream_blob_to_fd(struct object_database *odb,
    + 			  int fd,
    + 			  const struct object_id *oid,
     @@ odb/streaming.c: int odb_stream_blob_to_fd(struct object_database *odb,
      	odb_read_stream_close(st);
      	return result;
    @@ odb/streaming.c: int odb_stream_blob_to_fd(struct object_database *odb,
     +
     +	stream->data = data;
     +	stream->read = read_object_fd;
    ++	stream->is_finished = 0;
     +}
     
      ## odb/streaming.h ##
    @@ odb/streaming.h
      
      struct object_database;
      struct odb_read_stream;
    +@@ odb/streaming.h: struct odb_write_stream {
    + ssize_t odb_write_stream_read(struct odb_write_stream *stream, void *buf,
    + 			      size_t len);
    + 
    ++/*
    ++ * Releases memory allocated for underlying stream data.
    ++ */
    ++void odb_write_stream_release(struct odb_write_stream *stream);
    ++
    + /*
    +  * Look up the object by its ID and write the full contents to the file
    +  * descriptor. The object must be a blob, or the function will fail. When
     @@ odb/streaming.h: int odb_stream_blob_to_fd(struct object_database *odb,
      			  struct stream_filter *filter,
      			  int can_seek);
      
     +/*
    -+ * Sets up an ODB write stream that reads from an fd. The caller is expected to
    -+ * free the underlying stream data.
    ++ * Sets up an ODB write stream that reads from an fd.
     + */
     +void odb_write_stream_from_fd(struct odb_write_stream *stream, int fd,
     +			      size_t size);
5:  b7ac82ed7e = 5:  e4896101ff object-file: avoid fd seekback by checking object size upfront
6:  d6c4187a0f ! 6:  b3cb0a707c object-file: generalize packfile writes to use odb_write_stream
    @@ object-file.c: static int hash_blob_stream(struct odb_write_stream *stream,
      	unsigned char obuf[16384];
      	unsigned hdrlen;
      	int status = Z_OK;
    -+	size_t total = 0;
    ++	size_t bytes_read = 0;
      
      	git_deflate_init(&s, pack_compression_level);
      
    @@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
     -				die("failed to read %u bytes from '%s'",
     -				    (unsigned)rsize, path);
     +		if (!stream->is_finished && !s.avail_in) {
    -+			ssize_t rsize = stream->read(stream, ibuf, sizeof(ibuf));
    ++			ssize_t rsize = odb_write_stream_read(stream, ibuf,
    ++							      sizeof(ibuf));
     +
     +			if (rsize < 0)
     +				die("failed to read blob data");
    @@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
      			s.next_in = ibuf;
      			s.avail_in = rsize;
     -			size -= rsize;
    -+			total += rsize;
    ++			bytes_read += rsize;
      		}
      
     -		status = git_deflate(&s, size ? 0 : Z_FINISH);
    @@ object-file.c: static void stream_blob_to_pack(struct transaction_packfile *stat
      		}
      	}
     +
    -+	if (total != size)
    ++	if (bytes_read != size)
     +		die("read %" PRIuMAX " bytes of blob data, but expected %" PRIuMAX " bytes",
    -+		    (uintmax_t)total, (uintmax_t)size);
    ++		    (uintmax_t)bytes_read, (uintmax_t)size);
     +
      	git_deflate_end(&s);
      }
7:  2b81e94677 ! 7:  e1d292a7ed odb/transaction: make `write_object_stream()` pluggable
    @@ Commit message
         How an ODB transaction handles writing objects is expected to vary
         between implementations. Introduce a new `write_object_stream()`
         callback in `struct odb_transaction` to make this function pluggable.
    -    Wire up `index_blob_packfile_transaction()` for use with `struct
    -    odb_transaction_files` accordingly.
    +    Rename `index_blob_packfile_transaction()` to
    +    `odb_transaction_files_write_object_stream()` and wire it up for use
    +    with `struct odb_transaction_files` accordingly.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     

base-commit: 5361983c075154725be47b65cca9a2421789e410
-- 
2.53.0.381.g628a66ccf6


  parent reply	other threads:[~2026-04-02 21:32 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31  3:38 [PATCH 0/6] odb: add write operation to ODB transaction interface Justin Tobler
2026-03-31  3:38 ` [PATCH 1/6] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 13:56     ` Justin Tobler
2026-03-31 15:58       ` Junio C Hamano
2026-03-31 16:44         ` Justin Tobler
2026-03-31  3:38 ` [PATCH 2/6] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 3/6] object-file: remove flags from transaction packfile writes Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:10     ` Justin Tobler
2026-03-31  3:38 ` [PATCH 4/6] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:14     ` Justin Tobler
2026-03-31  3:38 ` [PATCH 5/6] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:31     ` Justin Tobler
2026-03-31 22:59       ` Patrick Steinhardt
2026-03-31 23:21         ` Justin Tobler
2026-03-31 23:40           ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 6/6] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:40     ` Justin Tobler
2026-04-01  3:03 ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Justin Tobler
2026-04-01  3:03   ` [PATCH v2 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-01  3:03   ` [PATCH v2 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01  3:03   ` [PATCH v2 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01 14:02       ` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-01  3:03   ` [PATCH v2 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-01  3:03   ` [PATCH v2 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-01 11:24   ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-04-02 21:32   ` Justin Tobler [this message]
2026-04-02 21:32     ` [PATCH v3 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-02 21:32     ` [PATCH v3 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-11 17:58       ` Jeff King
2026-05-12 15:19         ` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-06 20:16       ` Jeff King
2026-04-06 20:19         ` Jeff King
2026-04-02 21:32     ` [PATCH v3 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-02 21:32     ` [PATCH v3 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-02 21:32     ` [PATCH v3 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-08  7:25     ` [PATCH v3 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-05-14 18:37     ` [PATCH v4 " Justin Tobler
2026-05-14 18:37       ` [PATCH v4 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-05-14 18:37       ` [PATCH v4 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-05-14 18:37       ` [PATCH v4 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-14 18:37       ` [PATCH v4 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-05-14 18:37       ` [PATCH v4 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-05-14 18:37       ` [PATCH v4 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-05-14 18:37       ` [PATCH v4 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-05-15  3:56       ` [PATCH v4 0/7] odb: add write operation to ODB transaction interface Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260402213220.2651523-1-jltobler@gmail.com \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.