All of lore.kernel.org
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/6] object-file: avoid fd seekback by checking object size upfront
Date: Tue, 31 Mar 2026 09:14:02 -0500	[thread overview]
Message-ID: <acvV0_7DqGy_q9GY@denethor> (raw)
In-Reply-To: <act8W1BEg6iyUpHB@pks.im>

On 26/03/31 09:48AM, Patrick Steinhardt wrote:
> On Mon, Mar 30, 2026 at 10:38:33PM -0500, Justin Tobler wrote:
> > In certain scenarios, Git handles writing blobs that exceed
> > "core.bigFilesThreshold" differently by streaming the object directly
> > into a packfile. When there is an active ODB transaction, these blobs
> > are streamed to the same packfile instead of using a separate packfile
> > for each. If "pack.packSizeLimit" is configured and streaming another
> > object causes the packfile to exceed the configured limit, the packfile
> > is truncated back to the previous object and the object write is
> > restarted in a new packfile.
> > 
> > This works fine, but requires the fd being read from to save a
> > checkpoint so it becomes possible to rewind the input source via seeking
> > back to a known offset at the beginning. In a subsequent commit, blob
> > streaming is converted to use `struct odb_write_stream` as a more
> > generic input source instead of an fd which doesn't provide a mechanism
> > for rewinding.
> > 
> > For this use case though, rewinding the fd is not strictly necessary
> > because the inflated size of the object is known and can be used to
> > approximate whether writing the object would cause the packfile to
> > exceed the configured limit prior to writing anything. These blobs
> > written to the packfile are never deltafied thus the size difference
> 
> s/deltafied/deltified/

Will fix.


> > diff --git a/object-file.c b/object-file.c
> > index 493173eaf4..1de2244ac5 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1473,15 +1461,10 @@ static int stream_blob_to_pack(struct transaction_packfile *state,
> >  			if ((size_t)read_result != rsize)
> >  				die("failed to read %u bytes from '%s'",
> >  				    (unsigned)rsize, path);
> > -			offset += rsize;
> > -			if (*already_hashed_to < offset) {
> > -				size_t hsize = offset - *already_hashed_to;
> > -				if (rsize < hsize)
> > -					hsize = rsize;
> > -				if (hsize)
> > -					git_hash_update(ctx, ibuf, hsize);
> > -				*already_hashed_to = offset;
> > -			}
> > +
> > +			if (rsize)
> > +				git_hash_update(ctx, ibuf, rsize);
> 
> Is this guard really needed? I wouldn't expect that we ever try to read
> zero bytes into `ibuf`, and we bail in case we didn't receive the
> expected number of bytes.
> 
> And even if we did, `git_hash_update()` works just fine with no data.

Ya you are right, this guard is not needed. Will remove in the next
version.


> >  	header_len = format_object_header((char *)obuf, sizeof(obuf),
> >  					  OBJ_BLOB, size);
> >  	transaction->base.source->odb->repo->hash_algo->init_fn(&ctx);
> >  	git_hash_update(&ctx, obuf, header_len);
> >  
> > +	/*
> > +	 * If writing another object to the packfile could result in it
> > +	 * exceeding the configured size limit, flush the current packfile
> > +	 * transaction.
> > +	 */
> 
> Do we want to document that this intentionally works on the inflated
> size, not the deflated one, with the arguments mentioned in the commit
> message?

Good suggestion. Will update.

Thanks,
-Justin

  reply	other threads:[~2026-03-31 14:14 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31  3:38 [PATCH 0/6] odb: add write operation to ODB transaction interface Justin Tobler
2026-03-31  3:38 ` [PATCH 1/6] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 13:56     ` Justin Tobler
2026-03-31 15:58       ` Junio C Hamano
2026-03-31 16:44         ` Justin Tobler
2026-03-31  3:38 ` [PATCH 2/6] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 3/6] object-file: remove flags from transaction packfile writes Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:10     ` Justin Tobler
2026-03-31  3:38 ` [PATCH 4/6] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:14     ` Justin Tobler [this message]
2026-03-31  3:38 ` [PATCH 5/6] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:31     ` Justin Tobler
2026-03-31 22:59       ` Patrick Steinhardt
2026-03-31 23:21         ` Justin Tobler
2026-03-31 23:40           ` Patrick Steinhardt
2026-03-31  3:38 ` [PATCH 6/6] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-03-31  7:48   ` Patrick Steinhardt
2026-03-31 14:40     ` Justin Tobler
2026-04-01  3:03 ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Justin Tobler
2026-04-01  3:03   ` [PATCH v2 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-01  3:03   ` [PATCH v2 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01  3:03   ` [PATCH v2 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-01 11:23     ` Patrick Steinhardt
2026-04-01 14:02       ` Justin Tobler
2026-04-01  3:03   ` [PATCH v2 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-01  3:03   ` [PATCH v2 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-01  3:03   ` [PATCH v2 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-01 11:24   ` [PATCH v2 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-04-02 21:32   ` [PATCH v3 " Justin Tobler
2026-04-02 21:32     ` [PATCH v3 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-04-02 21:32     ` [PATCH v3 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-11 17:58       ` Jeff King
2026-05-12 15:19         ` Justin Tobler
2026-04-02 21:32     ` [PATCH v3 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-04-06 20:16       ` Jeff King
2026-04-06 20:19         ` Jeff King
2026-04-02 21:32     ` [PATCH v3 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-04-02 21:32     ` [PATCH v3 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-04-02 21:32     ` [PATCH v3 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-04-08  7:25     ` [PATCH v3 0/7] odb: add write operation to ODB transaction interface Patrick Steinhardt
2026-05-14 18:37     ` [PATCH v4 " Justin Tobler
2026-05-14 18:37       ` [PATCH v4 1/7] odb: split `struct odb_transaction` into separate header Justin Tobler
2026-05-14 18:37       ` [PATCH v4 2/7] odb/transaction: use pluggable `begin_transaction()` Justin Tobler
2026-05-14 18:37       ` [PATCH v4 3/7] odb: update `struct odb_write_stream` read() callback Justin Tobler
2026-05-14 18:37       ` [PATCH v4 4/7] object-file: remove flags from transaction packfile writes Justin Tobler
2026-05-14 18:37       ` [PATCH v4 5/7] object-file: avoid fd seekback by checking object size upfront Justin Tobler
2026-05-14 18:37       ` [PATCH v4 6/7] object-file: generalize packfile writes to use odb_write_stream Justin Tobler
2026-05-14 18:37       ` [PATCH v4 7/7] odb/transaction: make `write_object_stream()` pluggable Justin Tobler
2026-05-15  3:56       ` [PATCH v4 0/7] odb: add write operation to ODB transaction interface Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acvV0_7DqGy_q9GY@denethor \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.