Re: [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org,  ps@pks.im
Subject: Re: [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin()
Date: Fri, 22 Aug 2025 13:33:45 -0700	[thread overview]
Message-ID: <xmqqa53rw0ye.fsf@gitster.g> (raw)
In-Reply-To: <p6bdiflg2fe7kgw47ntg46csjhmp63gkovtf4inlc42f4dga77@5oieqa4bydne> (Justin Tobler's message of "Fri, 22 Aug 2025 14:13:27 -0500")

Justin Tobler <jltobler@gmail.com> writes:

> On 25/08/22 09:49AM, Junio C Hamano wrote:
>> Justin Tobler <jltobler@gmail.com> writes:
>> 
>> >  /*
>> > - * This creates one packfile per large blob unless bulk-checkin
>> > - * machinery is "plugged".
>> > + * This writes the specified object to a packfile. Objects written here
>> > + * during the same transaction are written to the same packfile. The
>> > + * packfile is not flushed until the transaction is flushed. The caller
>> > + * is expected to ensure a valid transaction is setup for objects to be
>> > + * recorded to.
>> >   *
>> >   * This also bypasses the usual "convert-to-git" dance, and that is on
>> >   * purpose. We could write a streaming version of the converting
>> > diff --git a/object-file.c b/object-file.c
>> > index 1740aa2b2e3..bc15af42450 100644
>> > --- a/object-file.c
>> > +++ b/object-file.c
>> > @@ -1253,19 +1253,26 @@ int index_fd(struct index_state *istate, struct object_id *oid,
>> >  	 * Call xsize_t() only when needed to avoid potentially unnecessary
>> >  	 * die() for large files.
>> >  	 */
>> > -	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path))
>> > +	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) {
>> >  		ret = index_stream_convert_blob(istate, oid, fd, path, flags);
>> > -	else if (!S_ISREG(st->st_mode))
>> > +	} else if (!S_ISREG(st->st_mode)) {
>> >  		ret = index_pipe(istate, oid, fd, type, path, flags);
>> > -	else if ((st->st_size >= 0 && (size_t) st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
>> > -		 type != OBJ_BLOB ||
>> > -		 (path && would_convert_to_git(istate, path)))
>> > +	} else if ((st->st_size >= 0 &&
>> > +		    (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
>> > +		   type != OBJ_BLOB ||
>> > +		   (path && would_convert_to_git(istate, path))) {
>> >  		ret = index_core(istate, oid, fd, xsize_t(st->st_size),
>> >  				 type, path, flags);
>> > -	else
>> > -		ret = index_blob_bulk_checkin(the_repository->objects->transaction,
>> > +	} else {
>> > +		struct odb_transaction *transaction;
>> > +
>> > +		transaction = begin_odb_transaction(the_repository->objects);
>> > +		ret = index_blob_bulk_checkin(transaction,
>> >  					      oid, fd, xsize_t(st->st_size),
>> >  					      path, flags);
>> > +		end_odb_transaction(transaction);
>> > +	}
>> 
>> Interesting.  If the caller does the odb transaction management
>> itself by calling begin/end before calling this function and fed two
>> or more large objects, the original code did the right thing with
>> .nesting set to 1.  In the new code, it still does the right thing
>> even during the call to index_blob_bulk_checkin we raise .nesting by
>> one, because the all non-zero .nesting values mean the same thing to
>> the machinery, thanks to lazy initialization of the .objdir.
>
> In the original code, if no transaction was setup prior to invoking
> index_blob_bulk_checkin() (i.e. nesting == 0), the object would be
> written to its own packfile and flushed immediately. If a transaction
> had been started further upstream (i.e nesting > 0), the object would be
> written to a packfile, but not flushed. This allowed for subseqent calls
> to index_blob_bulk_checkin() to write objects to the same packfile. Only
> when transaction ended would the packfile be flushed.
>
> With this change, index_blob_bulk_checkin() must be provided a
> transaction so it can write the object to the packfile. As there is now
> always a transaction involved, there is no longer any automatic packfile
> flushing. The caller is required to ensure a transaction handling as
> appropriate.
>
>> So, isn't the comment above the function now less accurate than
>> before?  The caller of this function does not have to do anything
>> and we do not expect the caller to "ensure a valid transaction" at
>> all, no?
>
> I'm not quite sure I follow. index_blob_bulk_checkin() now expects a
> transaction to be setup even if we intend to only write a single object.
> Thus the call site in index_fd() is adjusted to ensure there is a
> transaction via invoking begin_odb_transaction() and
> end_odb_transaction() before/after the function respectively.
>
> Just to clarify, are we talking about the comment above
> index_blob_bulk_checkin()?

Ahh, sorry, I misread the patch and missed the hunk/file boundary;
iow, thought the comment was about index_fd(), which was the source
of my confusion.

Thanks for correcting me.

next prev parent reply	other threads:[~2025-08-22 20:33 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20 22:55 [PATCH 0/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 1/3] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-20 22:55 ` [PATCH 2/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 3/3] bulk-checkin: wire repository variable Justin Tobler
2025-08-21  0:15   ` Junio C Hamano
2025-08-21 20:26     ` Justin Tobler
2025-08-21 20:32       ` Junio C Hamano
2025-08-21  0:00 ` [PATCH 0/3] bulk-checkin: remove global transaction state Junio C Hamano
2025-08-21 23:22 ` [PATCH v2 0/4] " Justin Tobler
2025-08-21 23:22   ` [PATCH v2 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-21 23:22   ` [PATCH v2 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 16:37     ` Junio C Hamano
2025-08-22 18:07       ` Justin Tobler
2025-08-22 20:25         ` Junio C Hamano
2025-08-21 23:22   ` [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 16:49     ` Junio C Hamano
2025-08-22 19:13       ` Justin Tobler
2025-08-22 20:33         ` Junio C Hamano [this message]
2025-08-21 23:22   ` [PATCH v2 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-22 17:03     ` Junio C Hamano
2025-08-22 19:38       ` Justin Tobler
2025-08-22 21:34   ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34     ` [PATCH v3 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-22 21:34     ` [PATCH v3 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34     ` [PATCH v3 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 21:35     ` [PATCH v3 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-25 20:25     ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa53rw0ye.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.