From: Junio C Hamano <gitster@pobox.com>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, ps@pks.im
Subject: Re: [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin()
Date: Fri, 22 Aug 2025 13:33:45 -0700 [thread overview]
Message-ID: <xmqqa53rw0ye.fsf@gitster.g> (raw)
In-Reply-To: <p6bdiflg2fe7kgw47ntg46csjhmp63gkovtf4inlc42f4dga77@5oieqa4bydne> (Justin Tobler's message of "Fri, 22 Aug 2025 14:13:27 -0500")
Justin Tobler <jltobler@gmail.com> writes:
> On 25/08/22 09:49AM, Junio C Hamano wrote:
>> Justin Tobler <jltobler@gmail.com> writes:
>>
>> > /*
>> > - * This creates one packfile per large blob unless bulk-checkin
>> > - * machinery is "plugged".
>> > + * This writes the specified object to a packfile. Objects written here
>> > + * during the same transaction are written to the same packfile. The
>> > + * packfile is not flushed until the transaction is flushed. The caller
>> > + * is expected to ensure a valid transaction is setup for objects to be
>> > + * recorded to.
>> > *
>> > * This also bypasses the usual "convert-to-git" dance, and that is on
>> > * purpose. We could write a streaming version of the converting
>> > diff --git a/object-file.c b/object-file.c
>> > index 1740aa2b2e3..bc15af42450 100644
>> > --- a/object-file.c
>> > +++ b/object-file.c
>> > @@ -1253,19 +1253,26 @@ int index_fd(struct index_state *istate, struct object_id *oid,
>> > * Call xsize_t() only when needed to avoid potentially unnecessary
>> > * die() for large files.
>> > */
>> > - if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path))
>> > + if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) {
>> > ret = index_stream_convert_blob(istate, oid, fd, path, flags);
>> > - else if (!S_ISREG(st->st_mode))
>> > + } else if (!S_ISREG(st->st_mode)) {
>> > ret = index_pipe(istate, oid, fd, type, path, flags);
>> > - else if ((st->st_size >= 0 && (size_t) st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
>> > - type != OBJ_BLOB ||
>> > - (path && would_convert_to_git(istate, path)))
>> > + } else if ((st->st_size >= 0 &&
>> > + (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) ||
>> > + type != OBJ_BLOB ||
>> > + (path && would_convert_to_git(istate, path))) {
>> > ret = index_core(istate, oid, fd, xsize_t(st->st_size),
>> > type, path, flags);
>> > - else
>> > - ret = index_blob_bulk_checkin(the_repository->objects->transaction,
>> > + } else {
>> > + struct odb_transaction *transaction;
>> > +
>> > + transaction = begin_odb_transaction(the_repository->objects);
>> > + ret = index_blob_bulk_checkin(transaction,
>> > oid, fd, xsize_t(st->st_size),
>> > path, flags);
>> > + end_odb_transaction(transaction);
>> > + }
>>
>> Interesting. If the caller does the odb transaction management
>> itself by calling begin/end before calling this function and fed two
>> or more large objects, the original code did the right thing with
>> .nesting set to 1. In the new code, it still does the right thing
>> even during the call to index_blob_bulk_checkin we raise .nesting by
>> one, because the all non-zero .nesting values mean the same thing to
>> the machinery, thanks to lazy initialization of the .objdir.
>
> In the original code, if no transaction was setup prior to invoking
> index_blob_bulk_checkin() (i.e. nesting == 0), the object would be
> written to its own packfile and flushed immediately. If a transaction
> had been started further upstream (i.e nesting > 0), the object would be
> written to a packfile, but not flushed. This allowed for subseqent calls
> to index_blob_bulk_checkin() to write objects to the same packfile. Only
> when transaction ended would the packfile be flushed.
>
> With this change, index_blob_bulk_checkin() must be provided a
> transaction so it can write the object to the packfile. As there is now
> always a transaction involved, there is no longer any automatic packfile
> flushing. The caller is required to ensure a transaction handling as
> appropriate.
>
>> So, isn't the comment above the function now less accurate than
>> before? The caller of this function does not have to do anything
>> and we do not expect the caller to "ensure a valid transaction" at
>> all, no?
>
> I'm not quite sure I follow. index_blob_bulk_checkin() now expects a
> transaction to be setup even if we intend to only write a single object.
> Thus the call site in index_fd() is adjusted to ensure there is a
> transaction via invoking begin_odb_transaction() and
> end_odb_transaction() before/after the function respectively.
>
> Just to clarify, are we talking about the comment above
> index_blob_bulk_checkin()?
Ahh, sorry, I misread the patch and missed the hunk/file boundary;
iow, thought the comment was about index_fd(), which was the source
of my confusion.
Thanks for correcting me.
next prev parent reply other threads:[~2025-08-22 20:33 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-20 22:55 [PATCH 0/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 1/3] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-20 22:55 ` [PATCH 2/3] bulk-checkin: remove global transaction state Justin Tobler
2025-08-20 22:55 ` [PATCH 3/3] bulk-checkin: wire repository variable Justin Tobler
2025-08-21 0:15 ` Junio C Hamano
2025-08-21 20:26 ` Justin Tobler
2025-08-21 20:32 ` Junio C Hamano
2025-08-21 0:00 ` [PATCH 0/3] bulk-checkin: remove global transaction state Junio C Hamano
2025-08-21 23:22 ` [PATCH v2 0/4] " Justin Tobler
2025-08-21 23:22 ` [PATCH v2 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-21 23:22 ` [PATCH v2 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 16:37 ` Junio C Hamano
2025-08-22 18:07 ` Justin Tobler
2025-08-22 20:25 ` Junio C Hamano
2025-08-21 23:22 ` [PATCH v2 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 16:49 ` Junio C Hamano
2025-08-22 19:13 ` Justin Tobler
2025-08-22 20:33 ` Junio C Hamano [this message]
2025-08-21 23:22 ` [PATCH v2 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-22 17:03 ` Junio C Hamano
2025-08-22 19:38 ` Justin Tobler
2025-08-22 21:34 ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34 ` [PATCH v3 1/4] bulk-checkin: introduce object database transaction structure Justin Tobler
2025-08-22 21:34 ` [PATCH v3 2/4] bulk-checkin: remove global transaction state Justin Tobler
2025-08-22 21:34 ` [PATCH v3 3/4] bulk-checkin: require transaction for index_blob_bulk_checkin() Justin Tobler
2025-08-22 21:35 ` [PATCH v3 4/4] bulk-checkin: use repository variable from transaction Justin Tobler
2025-08-25 20:25 ` [PATCH v3 0/4] bulk-checkin: remove global transaction state Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqa53rw0ye.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).