Git development
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/6] object-file: avoid fd seekback by checking object size upfront
Date: Tue, 31 Mar 2026 09:14:02 -0500	[thread overview]
Message-ID: <acvV0_7DqGy_q9GY@denethor> (raw)
In-Reply-To: <act8W1BEg6iyUpHB@pks.im>

On 26/03/31 09:48AM, Patrick Steinhardt wrote:
> On Mon, Mar 30, 2026 at 10:38:33PM -0500, Justin Tobler wrote:
> > In certain scenarios, Git handles writing blobs that exceed
> > "core.bigFilesThreshold" differently by streaming the object directly
> > into a packfile. When there is an active ODB transaction, these blobs
> > are streamed to the same packfile instead of using a separate packfile
> > for each. If "pack.packSizeLimit" is configured and streaming another
> > object causes the packfile to exceed the configured limit, the packfile
> > is truncated back to the previous object and the object write is
> > restarted in a new packfile.
> > 
> > This works fine, but requires the fd being read from to save a
> > checkpoint so it becomes possible to rewind the input source via seeking
> > back to a known offset at the beginning. In a subsequent commit, blob
> > streaming is converted to use `struct odb_write_stream` as a more
> > generic input source instead of an fd which doesn't provide a mechanism
> > for rewinding.
> > 
> > For this use case though, rewinding the fd is not strictly necessary
> > because the inflated size of the object is known and can be used to
> > approximate whether writing the object would cause the packfile to
> > exceed the configured limit prior to writing anything. These blobs
> > written to the packfile are never deltafied thus the size difference
> 
> s/deltafied/deltified/

Will fix.


> > diff --git a/object-file.c b/object-file.c
> > index 493173eaf4..1de2244ac5 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1473,15 +1461,10 @@ static int stream_blob_to_pack(struct transaction_packfile *state,
> >  			if ((size_t)read_result != rsize)
> >  				die("failed to read %u bytes from '%s'",
> >  				    (unsigned)rsize, path);
> > -			offset += rsize;
> > -			if (*already_hashed_to < offset) {
> > -				size_t hsize = offset - *already_hashed_to;
> > -				if (rsize < hsize)
> > -					hsize = rsize;
> > -				if (hsize)
> > -					git_hash_update(ctx, ibuf, hsize);
> > -				*already_hashed_to = offset;
> > -			}
> > +
> > +			if (rsize)
> > +				git_hash_update(ctx, ibuf, rsize);
> 
> Is this guard really needed? I wouldn't expect that we ever try to read
> zero bytes into `ibuf`, and we bail in case we didn't receive the
> expected number of bytes.
> 
> And even if we did, `git_hash_update()` works just fine with no data.

Ya you are right, this guard is not needed. Will remove in the next
version.


> >  	header_len = format_object_header((char *)obuf, sizeof(obuf),
> >  					  OBJ_BLOB, size);
> >  	transaction->base.source->odb->repo->hash_algo->init_fn(&ctx);
> >  	git_hash_update(&ctx, obuf, header_len);
> >  
> > +	/*
> > +	 * If writing another object to the packfile could result in it
> > +	 * exceeding the configured size limit, flush the current packfile
> > +	 * transaction.
> > +	 */
> 
> Do we want to document that this intentionally works on the inflated
> size, not the deflated one, with the arguments mentioned in the commit
> message?

Good suggestion. Will update.

Thanks,
-Justin

  reply	other threads:[~2026-03-31 14:14 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31  3:38 [PATCH 0/6] odb: add write operation to ODB transaction interface Justin Tobler
2026-03-31  3:38 ` [PATCH 1/6] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 13:56     ` Justin Tobler
2026-03-31 15:58       ` Junio C Hamano
2026-03-31 16:44         ` Justin Tobler
2026-03-31  3:38 ` [PATCH 2/6] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 3/6] object-file: remove flags from transaction packfile writes Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:10     ` Justin Tobler
2026-03-31  3:38 ` [PATCH 4/6] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:14     ` Justin Tobler [this message]
2026-03-31  3:38 ` [PATCH 5/6] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:31     ` Justin Tobler
2026-03-31 22:59       ` Patrick Steinhardt
2026-03-31 23:21         ` Justin Tobler
2026-03-31 23:40           ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 6/6] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:40     ` Justin Tobler
2026-04-01  3:03 ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Justin Tobler
2026-04-01  3:03   ` [PATCH v2 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-01  3:03   ` [PATCH v2 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01  3:03   ` [PATCH v2 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01 14:02       ` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-01  3:03   ` [PATCH v2 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-01  3:03   ` [PATCH v2 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-01 11:24   ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-04-02 21:32   ` [PATCH v3 " Justin Tobler
2026-04-02 21:32     ` [PATCH v3 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-02 21:32     ` [PATCH v3 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-11 17:58       ` Jeff King
2026-05-12 15:19         ` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-06 20:16       ` Jeff King
2026-04-06 20:19         ` Jeff King
2026-04-02 21:32     ` [PATCH v3 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-02 21:32     ` [PATCH v3 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-02 21:32     ` [PATCH v3 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-08  7:25     ` [PATCH v3 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-05-14 18:37     ` [PATCH v4 " Justin Tobler
2026-05-14 18:37       ` [PATCH v4 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-05-14 18:37       ` [PATCH v4 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-05-14 18:37       ` [PATCH v4 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-14 18:37       ` [PATCH v4 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-05-14 18:37       ` [PATCH v4 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-05-14 18:37       ` [PATCH v4 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-05-14 18:37       ` [PATCH v4 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-05-15  3:56       ` [PATCH v4 0/7] odb: add write operation to ODB transaction interface Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acvV0_7DqGy_q9GY@denethor \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox