git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org,  ps@pks.im
Subject: Re: [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin()
Date: Fri, 22 Aug 2025 09:49:39 -0700	[thread overview]
Message-ID: <xmqqqzx3xpwc.fsf@gitster.g> (raw)
In-Reply-To: <20250821232249.319427-4-jltobler@gmail.com> (Justin Tobler's message of "Thu, 21 Aug 2025 18:22:48 -0500")

Justin Tobler <jltobler@gmail.com> writes:

>  /*
> - * This creates one packfile per large blob unless bulk-checkin
> - * machinery is "plugged".
> + * This writes the specified object to a packfile. Objects written here
> + * during the same transaction are written to the same packfile. The
> + * packfile is not flushed until the transaction is flushed. The caller
> + * is expected to ensure a valid transaction is setup for objects to be
> + * recorded to.
>   *
>   * This also bypasses the usual "convert-to-git" dance, and that is on
>   * purpose. We could write a streaming version of the converting
> diff --git a/object-file.c b/object-file.c
> index 1740aa2b2e3..bc15af42450 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1253,19 +1253,26 @@ int index_fd(struct index_state *istate, struct object_id *oid,
>  	 * Call xsize_t() only when needed to avoid potentially unnecessary
>  	 * die() for large files.
>  	 */
> -	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path))
> +	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) {
>  		ret = index_stream_convert_blob(istate, oid, fd, path, flags);
> -	else if (!S_ISREG(st->st_mode))
> +	} else if (!S_ISREG(st->st_mode)) {
>  		ret = index_pipe(istate, oid, fd, type, path, flags);
> -	else if ((st->st_size >= 0 && (size_t) st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
> -		 type != OBJ_BLOB ||
> -		 (path && would_convert_to_git(istate, path)))
> +	} else if ((st->st_size >= 0 &&
> +		    (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
> +		   type != OBJ_BLOB ||
> +		   (path && would_convert_to_git(istate, path))) {
>  		ret = index_core(istate, oid, fd, xsize_t(st->st_size),
>  				 type, path, flags);
> -	else
> -		ret = index_blob_bulk_checkin(the_repository->objects->transaction,
> +	} else {
> +		struct odb_transaction *transaction;
> +
> +		transaction = begin_odb_transaction(the_repository->objects);
> +		ret = index_blob_bulk_checkin(transaction,
>  					      oid, fd, xsize_t(st->st_size),
>  					      path, flags);
> +		end_odb_transaction(transaction);
> +	}

Interesting.  If the caller does the odb transaction management
itself by calling begin/end before calling this function and fed two
or more large objects, the original code did the right thing with
.nesting set to 1.  In the new code, it still does the right thing
even during the call to index_blob_bulk_checkin we raise .nesting by
one, because the all non-zero .nesting values mean the same thing to
the machinery, thanks to lazy initialization of the .objdir.

So, isn't the comment above the function now less accurate than
before?  The caller of this function does not have to do anything
and we do not expect the caller to "ensure a valid transaction" at
all, no?

Thanks.





  reply	other threads:[~2025-08-22 16:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20 22:55 [PATCH 0/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 1/3] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-20 22:55 ` [PATCH 2/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 3/3] bulk-checkin: wire repository variable Justin Tobler
2025-08-21  0:15   ` Junio C Hamano
2025-08-21 20:26     ` Justin Tobler
2025-08-21 20:32       ` Junio C Hamano
2025-08-21  0:00 ` [PATCH 0/3] bulk-checkin: remove global transaction state Junio C Hamano
2025-08-21 23:22 ` [PATCH v2 0/4] " Justin Tobler
2025-08-21 23:22   ` [PATCH v2 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-21 23:22   ` [PATCH v2 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 16:37     ` Junio C Hamano
2025-08-22 18:07       ` Justin Tobler
2025-08-22 20:25         ` Junio C Hamano
2025-08-21 23:22   ` [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 16:49     ` Junio C Hamano [this message]
2025-08-22 19:13       ` Justin Tobler
2025-08-22 20:33         ` Junio C Hamano
2025-08-21 23:22   ` [PATCH v2 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-22 17:03     ` Junio C Hamano
2025-08-22 19:38       ` Justin Tobler
2025-08-22 21:34   ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34     ` [PATCH v3 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-22 21:34     ` [PATCH v3 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34     ` [PATCH v3 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 21:35     ` [PATCH v3 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-25 20:25     ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqqzx3xpwc.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).