Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 5/6] odb/transaction: add transaction env interface
From: Justin Tobler @ 2026-06-29 19:20 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <aju-_Nf3kmoIidue@pks.im>

On 26/06/24 01:26PM, Patrick Steinhardt wrote:
> On Tue, Jun 23, 2026 at 11:19:19PM -0500, Justin Tobler wrote:
> > The ODB transaction backend is responsible for creating/managing its own
> > staging area for writing objects. Other child processes spawned by Git
> > may need to access to uncommitted objects or write new objects in the
> 
> s/may need to access to/may need access to/

Will fix.

> > staging area though.
> > 
> > Introduce `odb_transaction_env()` which is expected to provide the set
> > of environment variables needed by a child process to access the
> > transaction staging area.
> 
> Possessive s is missing, I think.

Yes. :)

[snip]
> > +
> > +	/*
> > +	 * This callback is expected to return a NULL-terminated array of
> > +	 * environment variables that a child process should inherit so
> > +	 * that its object writes participate in the transaction. The
> > +	 * returned array is owned by the backend and remains valid until
> > +	 * the transaction ends. May return NULL when the backend does not
> > +	 * need to expose any state to child processes.
> > +	 */
> > +	const char **(*env)(struct odb_transaction *transaction);
> 
> Would it make more sense to adapt this function so that:
> 
>   - It receives a `struct strvec` as input that the environment
>     variables are to be amended to.
> 
>   - It returns a normal error code to indicate errors?

Ya, that is probably a more sensible interface as it would be nice to be
able to signal an error. Will do in the next version.

Thanks,
-Justin

^ permalink raw reply

* Re: [PATCH v5 0/4] history: add squash subcommand to fold a range
From: Phillip Wood @ 2026-06-29 19:48 UTC (permalink / raw)
  To: Harald Nordgren
  Cc: Patrick Steinhardt, phillip.wood,
	Harald Nordgren via GitGitGadget, git
In-Reply-To: <CAHwyqnWQmObWr3N81_EU6F13iyKp3FfY8KSNFfoAjS4r_0qJrQ@mail.gmail.com>

On 29/06/2026 19:03, Harald Nordgren wrote:
>> So instead of
>>
>>       # This is the combination of 4 commits
>>       # This is the first commit message
>>       Base subject
>>
>>       Base body
>>
>>       # This is the second commit message
>>       # Another subject
>>
>>       # Another body
>>
>>       # This is the third commit message
>>       # fixup! Base subject
>>
>>       # This is the fourth commit message
>>       # amend! Another subject
>>       A better subject
>>
>>       A better body
>>
>> We'd have
>>
>>       # This is the combination of 4 commits
>>       # 123 Base subject
>>       # 456 Another subject
>>       # 789 fixup! Base subject
>>       # abc amend! Another subject
>>
>>       Base Subject
>>
>>       Base Body
>>
>>       Another subject
>>
>>       Another Body
> 
> I think this makes it a lot harder to read. If every commit body was
> always just a single paragraph it could make sense, but it's generally
> not. Look at the commits in this series, with no delimiter of where
> one commit message ends and the next one starts, it would be very
> confusing.

You've trimmed the line where I said

 >> Possibly with a comment before each message saying where it came
 >> from.

So I'm not against adding a comment before each message, but I do think 
we should omit any messages that would be commented out completely. If 
you look at the rebase example above you can see there is a mass of 
comments between the two pieces of text that make up the new message. 
That makes it hard to see what is actually going to be included in the 
new message.

Thanks

Phillip

>> It would be good to error out if the user tries squash a fixup! style
>> commit and range does not contain its target commit.
> 
> Good point, I don't see a good reason to allow this.
> 
>> There does seem to be some support for merges in this patch series which
>> I think behaves pretty sensibly. If we have
>>
>>             C - D
>>            /     \
>>     - A - B - E - F - G
>>
>> Then squashing A..G should be fine because the parents of F are in the
>> range and it looks like we support that. Squashing should B..G without
>> --ancestry-path should be safe as well because B ends up as the parent
>> of the squashed commit but we don't have a way to disable
>> --ancestry-path (and maybe we don't want to add one). Squashing F^@..G
>> might be useful to fixup a merge (though perhaps amending F rather than
>> creating G is a simpler way to fix a broken merge). Squashing E..G does
>> not make sense because the range does not include one of the merge parents.
> 
> Thanks for a good explanation here!
> 
> 
> Harald
> 


^ permalink raw reply

* Re: [PATCH v7 11/11] builtin/history: implement "drop" subcommand
From: Junio C Hamano @ 2026-06-29 19:49 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Pablo Sabater, Kristoffer Haugsbakk, Phillip Wood,
	Christian Couder
In-Reply-To: <20260629-b4-pks-history-drop-v7-11-6e9392a957d8@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> +static int find_head_tree_change(struct repository *repo,
> +				 const struct replay_result *result,
> +				 struct commit **old_head,
> +				 struct commit **new_head,
> +				 bool *changed)
> +{
> +	const struct replay_ref_update *head_update = NULL;
> +	struct commit *old_head_commit, *new_head_commit;
> +	struct tree *old_head_tree, *new_head_tree;
> +	const char *head_target;
> +	int head_flags;
> +
> +	*changed = false;
> +
> +	head_target = refs_resolve_ref_unsafe(get_main_ref_store(repo),
> +					      "HEAD", RESOLVE_REF_NO_RECURSE,
> +					      NULL, &head_flags);
> +	if (!head_target)
> +		return error(_("cannot look up HEAD"));

Here head_target would be something like "refs/heads/master", or
whatever the "HEAD" happens to point at.

> +	if (!(head_flags & REF_ISSYMREF))
> +		head_target = "HEAD";

But if it is not a symref, then it is a detached HEAD.  We manually
set it to "HEAD" again.  We know head_target was not NULL, so what
did we receive in it from refs_resolve_ref_unsafe() call, before we
overwrite it here?

> +	for (size_t i = 0; i < result->updates_nr; i++) {
> +		if (!strcmp(result->updates[i].refname, head_target)) {
> +			head_update = &result->updates[i];
> +			break;
> +		}
> +	}

We see if we are going to update the currently checked out branch.

The .updates[] array was earlier populated by the caller that called
compute_pending_ref_updates() before it called us.

> +	if (!head_update)
> +		return 0;

And we return if not.  That is a simple and happy case.  Otherwise
we'd tell the caller about the updated commit and tree (strictly
speaking that would be redundant, but since we have it already, it
is better to give them to the caller instead of forcing the caller
to unwrap the commit).

> +static int cmd_history_drop(int argc,
> +			    const char **argv,
> +			    const char *prefix,
> +			    struct repository *repo)
> +{
> +...
> +	struct option options[] = {
> +		OPT_CALLBACK_F(0, "update-refs", &action, "(branches|head)",
> +			       N_("control which refs should be updated"),
> +			       PARSE_OPT_NONEG, parse_ref_action),
> ...
> +		OPT_END(),
> +	};
> ...
> +	ret = compute_pending_ref_updates(&revs, action, original, rewritten,
> +					  empty, &result);

Here we call the function.  When action is "--update-refs=head",
doesn't this code in compute_pending_ref_updates() ... 

		if (action == REF_ACTION_HEAD &&
		    decoration->type != DECORATION_REF_HEAD)
			continue;

... skip any reference name (loaded by load_ref_decorations() lazily
by calling get_name_decoration()) that is not "HEAD", which would
mean we end up not finding any hits in the find_head_tree_change()
function call we make later ...

> +	if (ret) {
> +		ret = error(_("failed replaying descendants"));
> +		goto out;
> +	}
> + ...
> +	if (!is_bare_repository()) {
> +		ret = find_head_tree_change(repo, &result, &old_head,
> +					    &new_head, &head_moves);

... here?

^ permalink raw reply

* Re: [PATCH 6/6] builtin/receive-pack: stage incoming objects via ODB transactions
From: Justin Tobler @ 2026-06-29 20:25 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <aju_AmlKVi5UZaiQ@pks.im>

On 26/06/24 01:26PM, Patrick Steinhardt wrote:
> On Tue, Jun 23, 2026 at 11:19:20PM -0500, Justin Tobler wrote:
> > Objects received by git-receive-pack(1) are quarantined in a temporary
> > "incoming" directory and migrated into the object database prior to the
> > reference updates. The quarantine is currently managed through
> > `tmp_objdir` directly. In a pluggable ODB future, how exactly an object
> > gets written to a transaction may vary for a given ODB source. Refactor
> > git-receive-pack(1) to use the ODB transaction interfaces to manage the
> > object staging area in a more agnostic manner accordingly.
> > 
> > Note that the temporary directory created for git-receive-pack(1) is
> > eagerly created and uses a different prefix name. This behavior is
> 
> A different prefix name compared to what?

Currently the temporary directories created for ODB transactions all use
the prefix "bulk-fsync". The temp dir created by git-receive-pack(1) is
expected to have the prefix "incoming".

> > special cased in the "files" backend by having `odb_transaction_begin()`
> > callers that require this behavior provide an `ODB_TRANSACTION_RECEIVE`
> > flag.
> 
> Okay. I guess this is to retain existing behaviour where the temporary
> directory is created lazily everywhere else. Makes me wonder whether we
> should eventually change this to just unconditionally create the
> directory in all cases so that we can drop this new flag.

It would be nice to not have to have a flag here, but if we want to also
keep the existing temp dir prefixes, we would also need to keep the
flags.

> It might've also made sense to split this commit up into two: one to
> introduce the flag parameter, and then one to do the changes to
> git-receive-pack(1).

Ya, I think you are right. I actually originally had it this way but
squashed these two commits together for some reason. I'll split them
back up in the next version.

> > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> > index 19eb6a1b61..ee8e03e2ab 100644
> > --- a/builtin/receive-pack.c
> > +++ b/builtin/receive-pack.c
> > @@ -112,8 +112,6 @@ static enum {
> >  } use_keepalive;
> >  static int keepalive_in_sec = 5;
> >  
> > -static struct tmp_objdir *tmp_objdir;
> > -
> >  static struct proc_receive_ref {
> >  	unsigned int want_add:1,
> >  		     want_delete:1,
> 
> I assume the goal is that we convert all other users of the tmp-objdir
> subsystem to also use transactions eventually, so that this becomes an
> implementation detail fo the files transaction?

Yes, that is exactly my plan :)

> > @@ -2106,14 +2104,13 @@ static void execute_commands(struct command *commands,
> >  	 * Now we'll start writing out refs, which means the objects need
> >  	 * to be in their final positions so that other processes can see them.
> >  	 */
> > -	if (tmp_objdir_migrate(tmp_objdir) < 0) {
> > +	if (odb_transaction_commit(the_repository->objects->transaction)) {
> >  		for (cmd = commands; cmd; cmd = cmd->next) {
> >  			if (!cmd->error_string)
> >  				cmd->error_string = "unable to migrate objects to permanent storage";
> >  		}
> >  		return;
> >  	}
> > -	tmp_objdir = NULL;
> 
> We don't need to unset the transaction because that's what
> `odb_transaction_commit()` already does for us, I assume?

That is correct. The tmp_objdir global is actually removed and now
competely managed by the ODB transaction. We can probably actuall remove
the "tmp-objdir" include now too. I'll do that in the next version.

> > @@ -2326,7 +2323,8 @@ static void push_header_arg(struct strvec *args, struct pack_header *hdr)
> >  		     ntohl(hdr->hdr_version), ntohl(hdr->hdr_entries));
> >  }
> >  
> > -static const char *unpack(int err_fd, struct shallow_info *si)
> > +static const char *unpack(int err_fd, struct shallow_info *si,
> > +			  struct odb_transaction *transaction)
> >  {
> >  	struct pack_header hdr;
> >  	const char *hdr_err;
> 
> It feels a bit weird that we sometimes pass the transaction as
> parameter, whereas othertimes we access it via `the_repository`.

That's fair. I was trying to avoid the churn of wiring to all its
callsites, but it's probably best to be consistent. Maybe it would be
fine to just create a transaction global like we do for the reference
transaction?

> > @@ -2351,20 +2349,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
> >  		strvec_push(&child.args, alt_shallow_file);
> >  	}
> >  
> > -	tmp_objdir = tmp_objdir_create(the_repository, "incoming");
> > -	if (!tmp_objdir) {
> > -		if (err_fd > 0)
> > -			close(err_fd);
> > -		return "unable to create temporary object directory";
> > -	}
> > -	strvec_pushv(&child.env, tmp_objdir_env(tmp_objdir));
> > -
> > -	/*
> > -	 * Normally we just pass the tmp_objdir environment to the child
> > -	 * processes that do the heavy lifting, but we may need to see these
> > -	 * objects ourselves to set up shallow information.
> > -	 */
> > -	tmp_objdir_add_as_alternate(tmp_objdir);
> > +	strvec_pushv(&child.env, odb_transaction_env(transaction));
> 
> Interesting, this here seems like a change in behaviour. Previously we
> added the transactions as an alternate, but now we only propagate it via
> the environment. I didn't see this mentioned in the commit message.

That's not quite correct. By using the ODB transaction interface, the
underlying tmp_objdir is managed transparently. When we begin the ODB
transaction, the temp directory that gets created is automatically
created and applied as the primary ODB. This ultimately produces an
equivalent result, its just now set up when the transaction is started
(which happens a little bit earlier).

The only behavior change here is that the temp directory is being
applied as the primary instead of an alternate in the main
git-receive-pack(1) process. Since all writes though are handled by the
child processes that get spawned, this shouldn't really matter though.

This is certainly rather confusing though, and should be mentioned in
the commit message. I'll do this in the next version.

> > @@ -2707,7 +2694,10 @@ int cmd_receive_pack(int argc,
> >  		if (!si.nr_ours && !si.nr_theirs)
> >  			shallow_update = 0;
> >  		if (!delete_only(commands)) {
> > -			unpack_status = unpack_with_sideband(&si);
> > +			if (odb_transaction_begin(the_repository->objects, &transaction, ODB_TRANSACTION_RECEIVE))
> > +				unpack_status = "unable to start ODB transaction";
> 
> s/ODB/object/
> 
> This may be visible to the user, and "ODB" may mean nothing to them.

Will change in the next version.

> > diff --git a/object-file.c b/object-file.c
> > index 14064d188a..e7958753ec 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1702,7 +1703,8 @@ static const char **odb_transaction_files_env(struct odb_transaction *base)
> >  }
> >  
> >  int odb_transaction_files_begin(struct odb_source *source,
> > -				struct odb_transaction **out)
> > +				struct odb_transaction **out,
> > +				enum odb_transaction_flags flags)
> >  {
> >  	struct odb_transaction_files *transaction;
> >  	struct object_database *odb = source->odb;
> > @@ -1717,6 +1719,20 @@ int odb_transaction_files_begin(struct odb_source *source,
> >  	transaction->base.commit = odb_transaction_files_commit;
> >  	transaction->base.write_object_stream = odb_transaction_files_write_object_stream;
> >  	transaction->base.env = odb_transaction_files_env;
> > +
> > +	transaction->prefix = "bulk-fsync";
> > +	if (flags & ODB_TRANSACTION_RECEIVE) {
> > +		/*
> > +		 * ODB transactions for git-receive-pack(1) eagerly create a
> > +		 * temporary directory and use a different prefix.
> > +		 */
> > +		transaction->prefix = "incoming";
> > +		if (odb_transaction_files_prepare(&transaction->base)) {
> > +			free(transaction);
> > +			return -1;
> > +		}
> > +	}
> > +
> 
> Okay, makes sense. I really wonder whether we need to insist this much
> on the exact name used by this, but better be safe than sorry for now I
> guess.
> 
> And as mentioned before, I also wonder whether it really makes sense to
> have the lazy creation of the tmp-objdir. Maybe add a NEEDSWORK item
> here that mentions we want to investigate whether this is even needed at
> all?

Ya, I was worried that there would be some reason the temp dir prefix
should remain the same or there would be some performance reason to
lazily create directories. I didn't investigate too deeply though. I'll
add a NEEDSWORK comment to make note of this.

> > diff --git a/odb/transaction.h b/odb/transaction.h
> > index 536458297b..78392ff13d 100644
> > --- a/odb/transaction.h
> > +++ b/odb/transaction.h
> > @@ -44,6 +43,10 @@ struct odb_transaction {
> >  	const char **(*env)(struct odb_transaction *transaction);
> >  };
> >  
> > +enum odb_transaction_flags {
> > +	ODB_TRANSACTION_RECEIVE = (1 << 0),
> > +};
> 
> It's not clear at all what this flag does based on its name, so we
> should have documentation for it.

Will update.

Thanks,
-Justin

^ permalink raw reply

page:              | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox