Git development
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, ps@pks.im, peff@peff.net,
	 luca.stefani.ge1@gmail.com
Subject: Re: [PATCH] object-file: avoid ODB transaction when not writing objects
Date: Tue, 7 Apr 2026 16:43:08 -0500	[thread overview]
Message-ID: <adV44IS5iPwdjKAe@denethor> (raw)
In-Reply-To: <xmqqo6ju31wx.fsf@gitster.g>

On 26/04/07 02:18PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> I'd appreciate it if you can give your eyeballs to the attached to
> see if that is how you would fixed the bug in the original context
> of v2.52 track.  If everything looks OK, then there is no need to
> spend time backporting on your side.  We have everything necessary.

The amended patch below looks correct to me. Peff had some comments
regarding the test that I will work on addressing.

Thanks,
-Justin

> 
> --- >8 ---
> From: Justin Tobler <jltobler@gmail.com>
> Date: Tue, 7 Apr 2026 15:17:30 -0500
> Subject: [PATCH] object-file: avoid ODB transaction when not writing objects
> 
> In ce1661f9da (odb: add transaction interface, 2025-09-16), existing
> ODB transaction logic is adapted to create a transaction interface
> at the ODB layer. The intent here is for the ODB transaction
> interface to eventually provide an object source agnostic means to
> manage transactions.
> 
> An unintended consequence of this change though is that
> `object-file.c:index_fd()` may enter the ODB transaction path even
> when no object write is requested. In non-repository contexts, this
> can result in a NULL dereference and segfault. One such case occurs
> when running git-diff(1) outside of a repository with
> "core.bigFileThreshold" forcing the streaming path in `index_fd()`:
> 
>         $ echo foo >foo
>         $ echo bar >bar
>         $ git -c core.bigFileThreshold=1 diff -- foo bar
> 
> In this scenario, the caller only needs to compute the object ID. Object
> hashing does not require an ODB, so starting a transaction is both
> unnecessary and invalid.
> 
> Fix the bug by avoiding the use of ODB transactions in `index_fd()` when
> callers are only interested in computing the object hash.
> 
> Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> [jc: adjusted to fd13909e (Merge branch 'jt/odb-transaction', 2025-10-02)]
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  object-file.c           | 49 ++++++++++++++++++++++++++++++++++-------
>  t/t1517-outside-repo.sh |  8 +++++++
>  2 files changed, 49 insertions(+), 8 deletions(-)
> 
> diff --git a/object-file.c b/object-file.c
> index 17a236d2fe..0969e27f3d 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1599,6 +1599,34 @@ static int index_blob_packfile_transaction(struct odb_transaction *transaction,
>  	return 0;
>  }
>  
> +static int hash_blob_stream(const struct git_hash_algo *hash_algo,
> +			    struct object_id *result_oid, int fd, size_t size)
> +{
> +	unsigned char buf[16384];
> +	struct git_hash_ctx ctx;
> +	unsigned header_len;
> +
> +	header_len = format_object_header((char *)buf, sizeof(buf),
> +					  OBJ_BLOB, size);
> +	hash_algo->init_fn(&ctx);
> +	git_hash_update(&ctx, buf, header_len);
> +
> +	while (size) {
> +		size_t rsize = size < sizeof(buf) ? size : sizeof(buf);
> +		ssize_t read_result = read_in_full(fd, buf, rsize);
> +
> +		if ((read_result < 0) || ((size_t)read_result != rsize))
> +			return -1;
> +
> +		git_hash_update(&ctx, buf, rsize);
> +		size -= read_result;
> +	}
> +
> +	git_hash_final_oid(result_oid, &ctx);
> +
> +	return 0;
> +}
> +
>  int index_fd(struct index_state *istate, struct object_id *oid,
>  	     int fd, struct stat *st,
>  	     enum object_type type, const char *path, unsigned flags)
> @@ -1620,14 +1648,19 @@ int index_fd(struct index_state *istate, struct object_id *oid,
>  		ret = index_core(istate, oid, fd, xsize_t(st->st_size),
>  				 type, path, flags);
>  	} else {
> -		struct odb_transaction *transaction;
> -
> -		transaction = odb_transaction_begin(the_repository->objects);
> -		ret = index_blob_packfile_transaction(the_repository->objects->transaction,
> -						      oid, fd,
> -						      xsize_t(st->st_size),
> -						      path, flags);
> -		odb_transaction_commit(transaction);
> +		if (flags & INDEX_WRITE_OBJECT) {
> +			struct odb_transaction *transaction;
> +
> +			transaction = odb_transaction_begin(the_repository->objects);
> +			ret = index_blob_packfile_transaction(the_repository->objects->transaction,
> +							      oid, fd,
> +							      xsize_t(st->st_size),
> +							      path, flags);
> +			odb_transaction_commit(transaction);
> +		} else {
> +			ret = hash_blob_stream(the_repository->hash_algo, oid,
> +					       fd, xsize_t(st->st_size));
> +		}
>  	}
>  
>  	close(fd);
> diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
> index c824c1a25c..c1dbc6359a 100755
> --- a/t/t1517-outside-repo.sh
> +++ b/t/t1517-outside-repo.sh
> @@ -93,6 +93,14 @@ test_expect_success 'diff outside repository' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'diff files exceeding bigFileThreshold outside repository' '
> +	cd non-repo &&
> +	echo foo >foo &&
> +	echo bar >bar &&
> +	test_must_fail git -c core.bigFileThreshold=1 diff -- foo bar >actual &&
> +	test_grep "diff --git a/foo b/bar" actual
> +'
> +
>  test_expect_success 'stripspace outside repository' '
>  	nongit git stripspace -s </dev/null
>  '
> -- 
> 2.54.0-rc0-282-gaf2821fd49
> 

  parent reply	other threads:[~2026-04-07 21:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07 20:17 [PATCH] object-file: avoid ODB transaction when not writing objects Justin Tobler
2026-04-07 21:18 ` Junio C Hamano
2026-04-07 21:29   ` Jeff King
2026-04-07 21:43     ` Junio C Hamano
2026-04-07 21:43   ` Justin Tobler [this message]
2026-04-07 21:53   ` Junio C Hamano
2026-04-07 22:08     ` Justin Tobler
2026-04-07 22:24       ` Junio C Hamano
2026-04-07 22:41         ` Justin Tobler
2026-04-08  0:42           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adV44IS5iPwdjKAe@denethor \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=luca.stefani.ge1@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox