git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org,  ps@pks.im
Subject: Re: [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin()
Date: Fri, 22 Aug 2025 13:33:45 -0700	[thread overview]
Message-ID: <xmqqa53rw0ye.fsf@gitster.g> (raw)
In-Reply-To: <p6bdiflg2fe7kgw47ntg46csjhmp63gkovtf4inlc42f4dga77@5oieqa4bydne> (Justin Tobler's message of "Fri, 22 Aug 2025 14:13:27 -0500")

Justin Tobler <jltobler@gmail.com> writes:

> On 25/08/22 09:49AM, Junio C Hamano wrote:
>> Justin Tobler <jltobler@gmail.com> writes:
>> 
>> >  /*
>> > - * This creates one packfile per large blob unless bulk-checkin
>> > - * machinery is "plugged".
>> > + * This writes the specified object to a packfile. Objects written here
>> > + * during the same transaction are written to the same packfile. The
>> > + * packfile is not flushed until the transaction is flushed. The caller
>> > + * is expected to ensure a valid transaction is setup for objects to be
>> > + * recorded to.
>> >   *
>> >   * This also bypasses the usual "convert-to-git" dance, and that is on
>> >   * purpose. We could write a streaming version of the converting
>> > diff --git a/object-file.c b/object-file.c
>> > index 1740aa2b2e3..bc15af42450 100644
>> > --- a/object-file.c
>> > +++ b/object-file.c
>> > @@ -1253,19 +1253,26 @@ int index_fd(struct index_state *istate, struct object_id *oid,
>> >  	 * Call xsize_t() only when needed to avoid potentially unnecessary
>> >  	 * die() for large files.
>> >  	 */
>> > -	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path))
>> > +	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) {
>> >  		ret = index_stream_convert_blob(istate, oid, fd, path, flags);
>> > -	else if (!S_ISREG(st->st_mode))
>> > +	} else if (!S_ISREG(st->st_mode)) {
>> >  		ret = index_pipe(istate, oid, fd, type, path, flags);
>> > -	else if ((st->st_size >= 0 && (size_t) st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
>> > -		 type != OBJ_BLOB ||
>> > -		 (path && would_convert_to_git(istate, path)))
>> > +	} else if ((st->st_size >= 0 &&
>> > +		    (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
>> > +		   type != OBJ_BLOB ||
>> > +		   (path && would_convert_to_git(istate, path))) {
>> >  		ret = index_core(istate, oid, fd, xsize_t(st->st_size),
>> >  				 type, path, flags);
>> > -	else
>> > -		ret = index_blob_bulk_checkin(the_repository->objects->transaction,
>> > +	} else {
>> > +		struct odb_transaction *transaction;
>> > +
>> > +		transaction = begin_odb_transaction(the_repository->objects);
>> > +		ret = index_blob_bulk_checkin(transaction,
>> >  					      oid, fd, xsize_t(st->st_size),
>> >  					      path, flags);
>> > +		end_odb_transaction(transaction);
>> > +	}
>> 
>> Interesting.  If the caller does the odb transaction management
>> itself by calling begin/end before calling this function and fed two
>> or more large objects, the original code did the right thing with
>> .nesting set to 1.  In the new code, it still does the right thing
>> even during the call to index_blob_bulk_checkin we raise .nesting by
>> one, because the all non-zero .nesting values mean the same thing to
>> the machinery, thanks to lazy initialization of the .objdir.
>
> In the original code, if no transaction was setup prior to invoking
> index_blob_bulk_checkin() (i.e. nesting == 0), the object would be
> written to its own packfile and flushed immediately. If a transaction
> had been started further upstream (i.e nesting > 0), the object would be
> written to a packfile, but not flushed. This allowed for subseqent calls
> to index_blob_bulk_checkin() to write objects to the same packfile. Only
> when transaction ended would the packfile be flushed.
>
> With this change, index_blob_bulk_checkin() must be provided a
> transaction so it can write the object to the packfile. As there is now
> always a transaction involved, there is no longer any automatic packfile
> flushing. The caller is required to ensure a transaction handling as
> appropriate.
>
>> So, isn't the comment above the function now less accurate than
>> before?  The caller of this function does not have to do anything
>> and we do not expect the caller to "ensure a valid transaction" at
>> all, no?
>
> I'm not quite sure I follow. index_blob_bulk_checkin() now expects a
> transaction to be setup even if we intend to only write a single object.
> Thus the call site in index_fd() is adjusted to ensure there is a
> transaction via invoking begin_odb_transaction() and
> end_odb_transaction() before/after the function respectively.
>
> Just to clarify, are we talking about the comment above
> index_blob_bulk_checkin()?

Ahh, sorry, I misread the patch and missed the hunk/file boundary;
iow, thought the comment was about index_fd(), which was the source
of my confusion.

Thanks for correcting me.

  reply	other threads:[~2025-08-22 20:33 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20 22:55 [PATCH 0/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 1/3] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-20 22:55 ` [PATCH 2/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 3/3] bulk-checkin: wire repository variable Justin Tobler
2025-08-21  0:15   ` Junio C Hamano
2025-08-21 20:26     ` Justin Tobler
2025-08-21 20:32       ` Junio C Hamano
2025-08-21  0:00 ` [PATCH 0/3] bulk-checkin: remove global transaction state Junio C Hamano
2025-08-21 23:22 ` [PATCH v2 0/4] " Justin Tobler
2025-08-21 23:22   ` [PATCH v2 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-21 23:22   ` [PATCH v2 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 16:37     ` Junio C Hamano
2025-08-22 18:07       ` Justin Tobler
2025-08-22 20:25         ` Junio C Hamano
2025-08-21 23:22   ` [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 16:49     ` Junio C Hamano
2025-08-22 19:13       ` Justin Tobler
2025-08-22 20:33         ` Junio C Hamano [this message]
2025-08-21 23:22   ` [PATCH v2 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-22 17:03     ` Junio C Hamano
2025-08-22 19:38       ` Justin Tobler
2025-08-22 21:34   ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34     ` [PATCH v3 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-22 21:34     ` [PATCH v3 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34     ` [PATCH v3 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 21:35     ` [PATCH v3 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-25 20:25     ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa53rw0ye.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).