Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2] checkout/switch: disallow checking out same branch in multiple worktrees
From: Junio C Hamano @ 2023-01-18 22:55 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón
  Cc: git, pclouds, Jinwook Jeong, Rubén Justo, Eric Sunshine
In-Reply-To: <20230118061527.76218-1-carenas@gmail.com>

Carlo Marcelo Arenas Belón  <carenas@gmail.com> writes:

> Changes since v1
> * A much better commit message
> * Changes to the tests as suggested by Eric
> * Changes to the logic as suggested by Rubén

I queued this topic at the tip of 'seen' as 2fe0b4e3 (Merge branch
'cb/checkout-same-branch-twice' into seen, 2023-01-18), on top of
4ea8693b (Merge branch 'mc/credential-helper-auth-headers' into
seen, 2023-01-18).

 - 4ea8693b - https://github.com/git/git/actions/runs/3952916442
 - 2fe0b4e3 - https://github.com/git/git/actions/runs/3953521066

Comparing these two runs, inclusion of this topic seems to introduce
new leaks, as t1408 and t2018 (neither of which was touched by this
topic) that used to pass are now failing.

>  builtin/checkout.c      | 24 +++++++++++++++++-------
>  t/t2400-worktree-add.sh | 18 ++++++++++++++++--
>  2 files changed, 33 insertions(+), 9 deletions(-)

Thanks.  

^ permalink raw reply

* Re: [PATCH v8 3/4] worktree add: add --orphan flag
From: 'Jacob Abel' @ 2023-01-18 22:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: rsbecker, phillip.wood, git,
	'Ævar Arnfjörð Bjarmason',
	'Eric Sunshine', 'Phillip Wood',
	'Rubén Justo', 'Taylor Blau'
In-Reply-To: <xmqq8ri4ihjg.fsf@gitster.g>

On 23/01/14 07:49PM, Junio C Hamano wrote:
> <rsbecker@nexbridge.com> writes:
>
> > [...]
>
> An orphan is not even detached, if I understand correctly.
>
> The state is what is called "being on an unborn branch", where your
> HEAD does not even point at any commit.  HEAD only knows a name of a
> branch that is not yet created but will be when you make a commit.
>
> While "(HEAD) being detached" means that you are on an existing
> commit---it is just that future history you extend by making a
> commit from that state will not be on any branch.
>
> So if we wanted to fix the misnomer, s/orphan/unborn/ would be how I
> would go about it.
>

I would support making this change (s/orphan/unborn/) as it's definitely less
confusing. Especially given that orphan already has a completely different,
overloaded meaning when referring to orphaned objects & commits (in the context
of garbage collection).


^ permalink raw reply

* Re: [PATCH v8 3/4] worktree add: add --orphan flag
From: Jacob Abel @ 2023-01-18 22:40 UTC (permalink / raw)
  To: phillip.wood
  Cc: git, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Junio C Hamano, Phillip Wood, Rubén Justo, Taylor Blau,
	rsbecker
In-Reply-To: <70a01a52-f16c-e85f-297e-c42a23f95a9a@dunelm.org.uk>

On 23/01/16 10:47AM, Phillip Wood wrote:
> Hi Jacob
>
> On 14/01/2023 22:47, Jacob Abel wrote:
> > On 23/01/13 10:20AM, Phillip Wood wrote:
> >> Hi Jacob
> >>
> > [...]
> >
> > I'll reply to this message with the one-off patch to get feedback. Since this is
> > essentially a discrete change on top of v8, I can either keep it as a separate
> > patch or reroll depending on how much needs to be changed (and what would be
> > easier for everyone).
> >
> >> 	git worktree add --orphan -b topic main
> >> 	git worktree add --orphan -B topic main
> >
> > I am hesitant to add these as they break away from the syntax used in
> > `git switch` and `git checkout`.
>
> When I wrote my original email I wrongly though that --orphan did not
> take an argument for "git checkout". While I think it is a mistake for
> checkout and switch to have --orphan take an argument they do at least
> always need a branch name with that option. "git worktree" add already
> has the branch name in the form of the worktree directory in the common
> case.

Understood.

I'm not entirely opposed to making this change to OPT_BOOL but I have to wonder
how often `--orphan` will actually be used by a given user and whether the
slightly shorter invocation will be used regularly.

With the base `git worktree add $path`, the shorthand/DWYM makes sense as it's
used regularly but I don't see users working with `--orphan` outside of trying
to create the first branch in a repository.

And I'd like that operation of creating the first branch in a repo to eventually
"just work" with the base command, i.e. `git worktree add main/`. The reason I
hadn't yet added that is because I've yet to figure out how to get it to work
without accidentally introducing potentially confusing situations and I didn't
want to hold up introducing the core functionality itself.

Once that main use-case "just works", I don't see users utilising `--orphan`
except in very rare circumstances. Doubly so since the average user likely
shouldn't be using `--orphan` in most cases.

Hence the question of whether this change would be worth it vs the existing
`--orphan $branchname $path` which is (for better or worse) consistent with `-b`
and `-B`.

>
> > Also apologies for the tangent but while researching this path, I noticed that
> > --orphan behaves unexpectedly on both `git switch` and `git checkout` when mixed
> > with `-c` and `-b` respectively.
> >
> >      % git switch --orphan -c foobar
> >      fatal: invalid reference: foobar
> >
> >      % git switch -c --orphan foobar
> >      fatal: invalid reference: foobar
> >      % git checkout -b --orphan foobar
> >      fatal: 'foobar' is not a commit and a branch '--orphan' cannot be created from it
> >
> >      % git checkout --orphan -b foobar
> >      fatal: 'foobar' is not a commit and a branch '-b' cannot be created from it
>
> The messages for checkout look better than the switch ones to me as they
> show the branch name which makes it clearer that we're treating what
> looks like an option as an argument. What in particular is unexpected
> here - --orphan and -b take an argument so they'll hoover up the next
> thing on the commandline whatever it is.
>
> Best Wishes
>
> Phillip
>
> > [...]

Agreed. I wasn't sure if this would be something worth addressing in a patch but
at the very least I can work on putting together a small patch for `git switch`
since it doesn't seem to be hoovering the flags like `git checkout` does.


^ permalink raw reply

* Re: [PATCH v8 3/4] worktree add: add --orphan flag
From: Jacob Abel @ 2023-01-18 22:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: phillip.wood, git, Ævar Arnfjörð Bjarmason,
	Eric Sunshine, Phillip Wood, Rubén Justo, Taylor Blau,
	rsbecker
In-Reply-To: <xmqqo7r0ijdv.fsf@gitster.g>

On 23/01/14 07:09PM, Junio C Hamano wrote:
> Jacob Abel <jacobabel@nullpo.dev> writes:
>
> >> 	git worktree add --orphan -b topic main
> >> 	git worktree add --orphan -B topic main
> >
> > I am hesitant to add these as they break away from the syntax used in
> > `git switch` and `git checkout`.
>
> Not that I care too deeply, but doesn't it introduce end-user
> confusion if we try to be compatible with "git checkout --orphan
> <branch>", while allowing this to be compatible with the default
> choice of the branch name done by "git worktree add"?  "--orphan" in
> "git checkout" behaves similar to "-b|-B" in that it always wants a
> name, but "git worktree add" wants to make it optional.

Yes. I think it's a fairly minor degree of confusion but I agree that it adds
potentially unneeded confusion.

>
> By the way "--orphan" in checkout|switch wants to take a name for
> itself, e.g.
>
> 	git checkout --orphan $name [$commit]
> 	git checkout -b $name [$commit]
> 	git checkout -B $name [$commit]
>
> so it is impossible to force their "--orphan" to rename an existing
> branch, which is probalby a design mistake we may want to fix.

Can you elaborate on what you mean by "rename an existing branch" here?

Do you mean like `git checkout --orphan $branchname` being able to convert an
existing branch into an orphan/unborn branch?

Also a small point but in an earlier thread [1], we made the decision to model
functionality on `git switch --orphan $branch` instead of
`git checkout --orphan $branch [$commit]`.

>
> In any case, as I said, I do not care too deeply which way you guys
> decide to go, because I think the whole "orphan" UI is a design
> mistake that instills a broken mental model to its users [*].

Understood.

>
> But let's wait a bit more to see which among
>
> (1) git worktree add [[--orphan] -b $branch] $path
>     This allows --orphan to act as a modifier to existing -b,
>
> (2) git worktree add [(--orphan|-b) $branch] $path
>     This allows --orphan to be another mode of -b, or
>
> (3) git worktree add [--orphan [$branch]|(-b $branch)] $path
>     This allows --orphan to default to $(basename $path)
>
> people prefer.
>

I'd personally argue that option 2 (the current behavior) is probably the
cleanest path forward as option 3 requires a bit of awkward code [2] and
`--orphan` is such an esoteric option that the user may only use it once or
twice in the life of a given repository, if that.

And eventually I'd like `git worktree add $path` to "just work" on a new/empty
repository. However as things stand, there wasn't an easy way to do this without
leading to potentially confusing behavior. It can be done, I just haven't taken
the time to figure it out yet.

Once `git worktree add $path` "just works" (when creating the first branch in a
repo), I highly doubt anyone would use `--orphan` often enough to justify the
use of shorthand options 1 or 3.

>
> [Footnote]
>
> * I am not saying that it is wrong or useless to keep an unrelated
>   history, especially one that records trees that have no relevance
>   to the main history like created with "switch --orphan", in the
>   same repository.  Allowing "git switch --orphan" to create such a
>   separate history in the same repository blurs the distinction.  It
>   would help newbies to form the right mental model if they start a
>   separate repository that the separate history originates in, and
>   pull from it to bootstrap the unrelated history in the local
>   repository.

Definitely agreed that `--orphan` is esoteric and probably should be avoided by
most users where possible.

1. https://lore.kernel.org/git/CAPig+cSVzewXpk+eDSC-W-+Q8X_7ikZXXeSQbmpHBcdLCU5svw@mail.gmail.com/
2. https://lore.kernel.org/git/20230114224956.24801-1-jacobabel@nullpo.dev/


^ permalink raw reply

* Re: [PATCH] branch: improve advice when --recurse-submodules fails
From: Glen Choo @ 2023-01-18 21:58 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget, git; +Cc: Philippe Blain
In-Reply-To: <pull.1464.git.1673890908453.gitgitgadget@gmail.com>

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Philippe Blain <levraiphilippeblain@gmail.com>
>
> 'git branch --recurse-submodules start from-here' fails if any submodule
> present in 'from-here' is not yet cloned (under
> submodule.propagateBranches=true). We then give this advice:
>
>    "You may try updating the submodules using 'git checkout from-here && git submodule update --init'"
>
> If 'submodule.recurse' is set, 'git checkout from-here' will also fail since
> it will try to recursively checkout the submodules.

Ah, yes that is true.

> diff --git a/branch.c b/branch.c
> index d182756827f..e5614b53b36 100644
> --- a/branch.c
> +++ b/branch.c
> @@ -756,7 +756,7 @@ void create_branches_recursively(struct repository *r, const char *name,
>  				_("submodule '%s': unable to find submodule"),
>  				submodule_entry_list.entries[i].submodule->name);
>  			if (advice_enabled(ADVICE_SUBMODULES_NOT_UPDATED))
> -				advise(_("You may try updating the submodules using 'git checkout %s && git submodule update --init'"),
> +				advise(_("You may try updating the submodules using 'git checkout --no-recurse-submodules %s && git submodule update --init'"),
>  				       start_commitish);
>  			exit(code);
>  		}
>

Makes sense. Thanks!

Reviewed-by: Glen Choo <chooglen@google.com>

^ permalink raw reply

* Re: [RFC/PATCH 0/6] hash-object: use fsck to check objects
From: Taylor Blau @ 2023-01-18 21:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, René Scharfe,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <xmqqmt6f4l03.fsf@gitster.g>

On Wed, Jan 18, 2023 at 12:59:24PM -0800, Junio C Hamano wrote:
> The --literally option was invented initially primarily to allow a
> bogus type of object (e.g. "hash-object -t xyzzy --literally") but I
> am happy to see that we are finding different uses.  I wonder if
> these objects of known types but with syntactically bad contents can
> be "repack"ed from loose into packed?
>
> >   [5/6]: fsck: provide a function to fsck buffer without object struct

It is indeed possible:

--- >8 ---
Initialized empty Git repository in /home/ttaylorr/src/git/t/trash directory.t9999-test/.git/
expecting success of 9999.1 'repacking corrupt loose object into packed':
	name=$(echo $ZERO_OID | sed -e "s/00/Q/g") &&
	printf "100644 fooQ$name" | q_to_nul |
		git hash-object -w --stdin -t tree >in &&

	git pack-objects .git/objects/pack/pack <in

Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
06146c77fd19c096858d6459d602be0fdf10891b
Writing objects: 100% (1/1), done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
ok 1 - repacking corrupt loose object into packed
--- 8< ---

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 6/6] hash-object: use fsck for object checks
From: Taylor Blau @ 2023-01-18 21:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8haHL9xIWntSm0/@coredump.intra.peff.net>

On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote:
> This is obviously going to be a user-visible behavior change, and the
> test changes earlier in this series show the scope of the impact. But
> I'd argue that this is OK:
>
>   - the documentation for hash-object is already vague about which
>     checks we might do, saying that --literally will allow "any
>     garbage[...] which might not otherwise pass standard object parsing
>     or git-fsck checks". So we are already covered under the documented
>     behavior.
>
>   - users don't generally run hash-object anyway. There are a lot of
>     spots in the tests that needed to be updated because creating
>     garbage objects is something that Git's tests disproportionately do.
>
>   - it's hard to imagine anyone thinking the new behavior is worse. Any
>     object we reject would be a potential problem down the road for the
>     user. And if they really want to create garbage, --literally is
>     already the escape hatch they need.

This is the discussion I was pointing out earlier in the series as
evidence for making this behavior the new default without "--literally".

That being said, let me play devil's advocate for a second. Do the new
fsck checks slow anything in hash-object down significantly? If so, then
it's plausible to imagine a hash-object caller who (a) doesn't use
`--literally`, but (b) does care about throughput if they're writing a
large number of objects at once.

I don't know if such a situation exists, or if these new fsck checks
even slow hash-object down enough to care. But I didn't catch a
discussion of this case in your series, so I figured I'd bring it up
here just in case.

>   - the resulting messages are much better. For example:
>
>       [before]
>       $ echo 'tree 123' | git hash-object -t commit --stdin
>       error: bogus commit object 0000000000000000000000000000000000000000
>       fatal: corrupt commit
>
>       [after]
>       $ echo 'tree 123' | git.compile hash-object -t commit --stdin
>       error: object fails fsck: badTreeSha1: invalid 'tree' line format - bad sha1
>       fatal: refusing to create malformed object

Much nicer, well done.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  object-file.c          | 55 ++++++++++++++++++------------------------
>  t/t1007-hash-object.sh | 11 +++++++++
>  2 files changed, 34 insertions(+), 32 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index 80a0cd3b35..5c96384803 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -33,6 +33,7 @@
>  #include "object-store.h"
>  #include "promisor-remote.h"
>  #include "submodule.h"
> +#include "fsck.h"
>
>  /* The maximum size for an object header. */
>  #define MAX_HEADER_LEN 32
> @@ -2298,32 +2299,21 @@ int repo_has_object_file(struct repository *r,
>  	return repo_has_object_file_with_flags(r, oid, 0);
>  }
>
> -static void check_tree(const void *buf, size_t size)
> -{
> -	struct tree_desc desc;
> -	struct name_entry entry;
> -
> -	init_tree_desc(&desc, buf, size);
> -	while (tree_entry(&desc, &entry))
> -		/* do nothing
> -		 * tree_entry() will die() on malformed entries */
> -		;
> -}
> -
> -static void check_commit(const void *buf, size_t size)
> -{
> -	struct commit c;
> -	memset(&c, 0, sizeof(c));
> -	if (parse_commit_buffer(the_repository, &c, buf, size, 0))
> -		die(_("corrupt commit"));
> -}
> -
> -static void check_tag(const void *buf, size_t size)
> -{
> -	struct tag t;
> -	memset(&t, 0, sizeof(t));
> -	if (parse_tag_buffer(the_repository, &t, buf, size))
> -		die(_("corrupt tag"));

OK, here we're getting rid of all of the lightweight checks that
hash-object used to implement on its own.

> +/*
> + * We can't use the normal fsck_error_function() for index_mem(),
> + * because we don't yet have a valid oid for it to report. Instead,
> + * report the minimal fsck error here, and rely on the caller to
> + * give more context.
> + */
> +static int hash_format_check_report(struct fsck_options *opts,
> +				     const struct object_id *oid,
> +				     enum object_type object_type,
> +				     enum fsck_msg_type msg_type,
> +				     enum fsck_msg_id msg_id,
> +				     const char *message)
> +{
> +	error(_("object fails fsck: %s"), message);
> +	return 1;
>  }
>
>  static int index_mem(struct index_state *istate,
> @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate,
>  		}
>  	}
>  	if (flags & HASH_FORMAT_CHECK) {
> -		if (type == OBJ_TREE)
> -			check_tree(buf, size);
> -		if (type == OBJ_COMMIT)
> -			check_commit(buf, size);
> -		if (type == OBJ_TAG)
> -			check_tag(buf, size);
> +		struct fsck_options opts = FSCK_OPTIONS_DEFAULT;
> +
> +		opts.strict = 1;
> +		opts.error_func = hash_format_check_report;
> +		if (fsck_buffer(null_oid(), type, buf, size, &opts))
> +			die(_("refusing to create malformed object"));
> +		fsck_finish(&opts);
>  	}

And here's the main part of the change, which is delightfully simple and
appears correct to me.

> diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
> index 2d2148d8fa..ac3d173767 100755
> --- a/t/t1007-hash-object.sh
> +++ b/t/t1007-hash-object.sh
> @@ -222,6 +222,17 @@ test_expect_success 'empty filename in tree' '
>  	grep "empty filename in tree entry" err
>  '
>
> +test_expect_success 'duplicate filename in tree' '
> +	hex_oid=$(echo foo | git hash-object --stdin -w) &&
> +	bin_oid=$(echo $hex_oid | hex2oct) &&
> +	{
> +		printf "100644 file\0$bin_oid" &&
> +		printf "100644 file\0$bin_oid"
> +	} >tree-with-duplicate-filename &&
> +	test_must_fail git hash-object -t tree tree-with-duplicate-filename 2>err &&
> +	grep "duplicateEntries" err
> +'
> +

For what it's worth, I think that this is sufficient coverage for the
new fsck checks.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 5/6] fsck: provide a function to fsck buffer without object struct
From: Taylor Blau @ 2023-01-18 21:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8haCbAQIV9s/95l@coredump.intra.peff.net>

On Wed, Jan 18, 2023 at 03:43:53PM -0500, Jeff King wrote:
> However, the only external interface that fsck.c provides is
> fsck_object(), which requires an object struct, then promptly discards
> everything except its oid and type. Let's factor out the post-discard
> part of that function as fsck_buffer(), leaving fsck_object() as a thin
> wrapper around it. That will provide more flexibility for callers which
> may not have a struct.

It's really nice that the only thing we care about having an object
struct around for is basically just knowing its type. IOW it seems to
have made the refactoring here pretty straightforward, which is nice
;-).

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 4/6] t: use hash-object --literally when created malformed objects
From: Taylor Blau @ 2023-01-18 21:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hZlN9rRg1msc0L@coredump.intra.peff.net>

On Wed, Jan 18, 2023 at 03:41:56PM -0500, Jeff King wrote:
> Many test scripts use hash-object to create malformed objects to see how
> we handle the results in various commands. In some cases we already have
> to use "hash-object --literally", because it does some rudimentary
> quality checks. But let's use "--literally" more consistently to
> future-proof these tests against hash-object learning to be more
> careful.

Heh, I suppose this is a good illustration of how loose our checks our
even without `--literally` ;-).

> ---
> This patch is worth looking at because it shows the kinds of things the
> new hash-object from patch 6 will reject.

Obviously we could avoid this patch entirely by making the new behavior
of fscking the incoming objects hidden behind a `--fsck` flag or
something. But I think the decision not to is a good one.

We already have `--literally`, and it makes sense that passing that
should let us write anything, and that not passing it should perform
some validity checks. But I think exactly *what* those checks are is
ambiguous enough that the absence of `--literally` implying fsck checks
isn't out of the question.

You address this in the last patch more thoroughly, but I figure that it
is worth stating some of this here during review to indicate that I
think the direction you pursued here is a good one.

>  t/t1450-fsck.sh                 | 28 ++++++++++++++--------------
>  t/t4054-diff-bogus-tree.sh      |  2 +-
>  t/t4058-diff-duplicates.sh      |  2 +-
>  t/t4212-log-corrupt.sh          |  4 ++--
>  t/t5302-pack-index.sh           |  2 +-
>  t/t5504-fetch-receive-strict.sh |  2 +-
>  t/t5702-protocol-v2.sh          |  2 +-
>  t/t6300-for-each-ref.sh         |  2 +-
>  t/t7509-commit-authorship.sh    |  2 +-
>  t/t7510-signed-commit.sh        |  2 +-
>  t/t7528-signed-commit-ssh.sh    |  2 +-
>  t/t8003-blame-corner-cases.sh   |  2 +-
>  t/t9350-fast-export.sh          |  2 +-
>  13 files changed, 27 insertions(+), 27 deletions(-)

And these all look good, too. Each of the spots you touch here is
limited to replacing "git hash-object" with "git hash-object --literally".

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 1/6] t1007: modernize malformed object tests
From: Taylor Blau @ 2023-01-18 21:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hYEgMze3bY44/0@coredump.intra.peff.net>

On Wed, Jan 18, 2023 at 03:35:30PM -0500, Jeff King wrote:
> The tests in t1007 for detecting malformed objects have two
> anachronisms:
>
>  - they use "sha1" instead of "oid" in variable names, even though the
>    script as a whole has been adapted to handle sha256

I appreciate you saying that we should s/sha1/oid here. But more
importantly, thanks for drawing attention to the fact that this script
already handles sha256, and that the update is purely cosmetic.

> ---
>  t/t1007-hash-object.sh | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

These look obviously correct.

Thanks,
Taylor

^ permalink raw reply

* Re: [RFC/PATCH 0/6] hash-object: use fsck to check objects
From: Junio C Hamano @ 2023-01-18 20:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

>   [1/6]: t1007: modernize malformed object tests

Obviously good.

>   [2/6]: t1006: stop using 0-padded timestamps
>   [3/6]: t7030: stop using invalid tag name

These two are pleasant to see and revealed what are "accepted" by
mistake, quite surprisingly.

>   [4/6]: t: use hash-object --literally when created malformed objects

The --literally option was invented initially primarily to allow a
bogus type of object (e.g. "hash-object -t xyzzy --literally") but I
am happy to see that we are finding different uses.  I wonder if
these objects of known types but with syntactically bad contents can
be "repack"ed from loose into packed?

>   [5/6]: fsck: provide a function to fsck buffer without object struct

Obvious, clean and very nice.

>   [6/6]: hash-object: use fsck for object checks

^ permalink raw reply

* Re: [RFC/PATCH 0/6] hash-object: use fsck to check objects
From: Jeff King @ 2023-01-18 20:46 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

On Wed, Jan 18, 2023 at 03:35:06PM -0500, Jeff King wrote:

> The other option is having the fsck code avoid looking past the size it
> was given. I think the intent is that this should work, from commits
> like 4d0d89755e (Make sure fsck_commit_buffer() does not run out of the
> buffer, 2014-09-11). We do use skip_prefix() and parse_oid_hex(), which
> won't respect the size, but I think[1] that's OK because we'll have
> parsed up to the end-of-header beforehand (and those functions would
> never match past there).
> 
> Which would mean that 9a1a3a4d4c (mktag: allow omitting the header/body
> \n separator, 2021-01-05) and acf9de4c94 (mktag: use fsck instead of
> custom verify_tag(), 2021-01-05) were buggy, and we can just fix them.

That would look something like this:

diff --git a/fsck.c b/fsck.c
index c2c8facd2d..d220276bcb 100644
--- a/fsck.c
+++ b/fsck.c
@@ -898,6 +898,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 {
 	int ret = 0;
 	char *eol;
+	const char *eob = buffer + size;
 	struct strbuf sb = STRBUF_INIT;
 	const char *p;
 
@@ -960,10 +961,8 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 	}
 	else
 		ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
-	if (!*buffer)
-		goto done;
 
-	if (!starts_with(buffer, "\n")) {
+	if (buffer != eob && *buffer != '\n') {
 		/*
 		 * The verify_headers() check will allow
 		 * e.g. "[...]tagger <tagger>\nsome

Changing the starts_with() is not strictly necessary, but I think it
makes it more clear that we are only going to look at the one character
we confirmed is still valid inside the buffer.

This is enough to have the whole test suite pass with ASan/UBSan after
my series. But as I said earlier, I'd want to look carefully at the rest
of the fsck code to make sure there aren't any other possible inputs
that could look past the end of the buffer.

-Peff

^ permalink raw reply related

* Re: [PATCH v6 2/2] send-email: expose header information to git-send-email's sendemail-validate hook
From: Michael Strawbridge @ 2023-01-18 20:44 UTC (permalink / raw)
  To: Luben Tuikov, Junio C Hamano; +Cc: git@vger.kernel.org
In-Reply-To: <fa9b1371-0a61-147f-637e-cb09f775fe22@amd.com>


On 2023-01-18 11:35, Luben Tuikov wrote:
> On 2023-01-18 11:27, Junio C Hamano wrote:
>> Luben Tuikov <luben.tuikov@amd.com> writes:
>>
>>> On 2023-01-17 02:31, Junio C Hamano wrote:
>>>> Luben Tuikov <luben.tuikov@amd.com> writes:
>>>>
>>>>>> +test_expect_success $PREREQ "--validate hook supports header argument" '
>>>>>> +	write_script my-hooks/sendemail-validate <<-\EOF &&
>>>>>> +	if test -s "$2"
>>>>>> +	then
>>>>>> +		cat "$2" >actual
>>>>>> +		exit 1
>>>>>> +	fi
>>>>>> +	EOF
>>>> If "$2" is not given, or an empty "$2" is given, is that an error?
>>>> I am wondering if the lack of "else" clause (and the hook exits with
>>>> success when "$2" is an empty file) here is intentional.
>>> I think we'll always have a $2, since it is the SMTP envelope and headers.
>> We write our tests to verify _that_ assumption you have.  A future
>> developer mistakenly drops the code to append the file to the
>> command line that invokes the hook, and we want our test to catch
>> such a mistake.
>>
>> Do we really feed envelope?  E.g. if the --envelope-sender=<who> is
>> used, does $2 have the "From:" from the header and "MAIL TO" from
>> the envelope separately?
> I'm not sure--I thought we did, but yes, we should _test_ that we indeed
> 1) have/get $2, as a non-empty string,
> 2) it is a non-empty, readable file,
> 3) contains the test header we included in git-format-patch in the test.
>
> This is what I meant when I wrote "we'll always have $2 ...", not having it
> is failure of some kind and yes we should test for it.

I've tested using the envelope-sender=<who> and the hook only gets the headers.  I've applied the feedback above in patch set v8 including a test for the 2nd argument.  The new test will fail if either the supplied argument is not a file or the custom header is not found.


^ permalink raw reply

* [PATCH 6/6] hash-object: use fsck for object checks
From: Jeff King @ 2023-01-18 20:44 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

Since c879daa237 (Make hash-object more robust against malformed
objects, 2011-02-05), we've done some rudimentary checks against objects
we're about to write by running them through our usual parsers for
trees, commits, and tags.

These parsers catch some problems, but they are not nearly as careful as
the fsck functions (which make sense; the parsers are designed to be
fast and forgiving, bailing only when the input is unintelligible). We
are better off doing the more thorough fsck checks when writing objects.
Doing so at write time is much better than writing garbage only to find
out later (after building more history atop it!) that fsck complains
about it, or hosts with transfer.fsckObjects reject it.

This is obviously going to be a user-visible behavior change, and the
test changes earlier in this series show the scope of the impact. But
I'd argue that this is OK:

  - the documentation for hash-object is already vague about which
    checks we might do, saying that --literally will allow "any
    garbage[...] which might not otherwise pass standard object parsing
    or git-fsck checks". So we are already covered under the documented
    behavior.

  - users don't generally run hash-object anyway. There are a lot of
    spots in the tests that needed to be updated because creating
    garbage objects is something that Git's tests disproportionately do.

  - it's hard to imagine anyone thinking the new behavior is worse. Any
    object we reject would be a potential problem down the road for the
    user. And if they really want to create garbage, --literally is
    already the escape hatch they need.

Note that the change here is actually in index_mem(), which handles the
HASH_FORMAT_CHECK flag passed by hash-object. That flag is also used by
"git-replace --edit" to sanity-check the result. Covering that with more
thorough checks likewise seems like a good thing.

Besides being more thorough, there are a few other bonuses:

  - we get rid of some questionable stack allocations of object structs.
    These don't seem to currently cause any problems in practice, but
    they subtly violate some of the assumptions made by the rest of the
    code (e.g., the "struct commit" we put on the stack and
    zero-initialize will not have a proper index from
    alloc_comit_index().

  - likewise, those parsed object structs are the source of some small
    memory leaks

  - the resulting messages are much better. For example:

      [before]
      $ echo 'tree 123' | git hash-object -t commit --stdin
      error: bogus commit object 0000000000000000000000000000000000000000
      fatal: corrupt commit

      [after]
      $ echo 'tree 123' | git.compile hash-object -t commit --stdin
      error: object fails fsck: badTreeSha1: invalid 'tree' line format - bad sha1
      fatal: refusing to create malformed object

Signed-off-by: Jeff King <peff@peff.net>
---
 object-file.c          | 55 ++++++++++++++++++------------------------
 t/t1007-hash-object.sh | 11 +++++++++
 2 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/object-file.c b/object-file.c
index 80a0cd3b35..5c96384803 100644
--- a/object-file.c
+++ b/object-file.c
@@ -33,6 +33,7 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "submodule.h"
+#include "fsck.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -2298,32 +2299,21 @@ int repo_has_object_file(struct repository *r,
 	return repo_has_object_file_with_flags(r, oid, 0);
 }
 
-static void check_tree(const void *buf, size_t size)
-{
-	struct tree_desc desc;
-	struct name_entry entry;
-
-	init_tree_desc(&desc, buf, size);
-	while (tree_entry(&desc, &entry))
-		/* do nothing
-		 * tree_entry() will die() on malformed entries */
-		;
-}
-
-static void check_commit(const void *buf, size_t size)
-{
-	struct commit c;
-	memset(&c, 0, sizeof(c));
-	if (parse_commit_buffer(the_repository, &c, buf, size, 0))
-		die(_("corrupt commit"));
-}
-
-static void check_tag(const void *buf, size_t size)
-{
-	struct tag t;
-	memset(&t, 0, sizeof(t));
-	if (parse_tag_buffer(the_repository, &t, buf, size))
-		die(_("corrupt tag"));
+/*
+ * We can't use the normal fsck_error_function() for index_mem(),
+ * because we don't yet have a valid oid for it to report. Instead,
+ * report the minimal fsck error here, and rely on the caller to
+ * give more context.
+ */
+static int hash_format_check_report(struct fsck_options *opts,
+				     const struct object_id *oid,
+				     enum object_type object_type,
+				     enum fsck_msg_type msg_type,
+				     enum fsck_msg_id msg_id,
+				     const char *message)
+{
+	error(_("object fails fsck: %s"), message);
+	return 1;
 }
 
 static int index_mem(struct index_state *istate,
@@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate,
 		}
 	}
 	if (flags & HASH_FORMAT_CHECK) {
-		if (type == OBJ_TREE)
-			check_tree(buf, size);
-		if (type == OBJ_COMMIT)
-			check_commit(buf, size);
-		if (type == OBJ_TAG)
-			check_tag(buf, size);
+		struct fsck_options opts = FSCK_OPTIONS_DEFAULT;
+
+		opts.strict = 1;
+		opts.error_func = hash_format_check_report;
+		if (fsck_buffer(null_oid(), type, buf, size, &opts))
+			die(_("refusing to create malformed object"));
+		fsck_finish(&opts);
 	}
 
 	if (write_object)
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 2d2148d8fa..ac3d173767 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -222,6 +222,17 @@ test_expect_success 'empty filename in tree' '
 	grep "empty filename in tree entry" err
 '
 
+test_expect_success 'duplicate filename in tree' '
+	hex_oid=$(echo foo | git hash-object --stdin -w) &&
+	bin_oid=$(echo $hex_oid | hex2oct) &&
+	{
+		printf "100644 file\0$bin_oid" &&
+		printf "100644 file\0$bin_oid"
+	} >tree-with-duplicate-filename &&
+	test_must_fail git hash-object -t tree tree-with-duplicate-filename 2>err &&
+	grep "duplicateEntries" err
+'
+
 test_expect_success 'corrupt commit' '
 	test_must_fail git hash-object -t commit --stdin </dev/null
 '
-- 
2.39.1.616.gd06fca9e99

^ permalink raw reply related

* [PATCH 5/6] fsck: provide a function to fsck buffer without object struct
From: Jeff King @ 2023-01-18 20:43 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

The fsck code has been slowly moving away from requiring an object
struct in commits like 103fb6d43b (fsck: accept an oid instead of a
"struct tag" for fsck_tag(), 2019-10-18), c5b4269b57 (fsck: accept an
oid instead of a "struct commit" for fsck_commit(), 2019-10-18), etc.

However, the only external interface that fsck.c provides is
fsck_object(), which requires an object struct, then promptly discards
everything except its oid and type. Let's factor out the post-discard
part of that function as fsck_buffer(), leaving fsck_object() as a thin
wrapper around it. That will provide more flexibility for callers which
may not have a struct.

Signed-off-by: Jeff King <peff@peff.net>
---
This is obviously preparation for the next patch. But I suspect it could
be used elsewhere, too. Regular fsck wants object structs anyway to hold
flags, I think, but index-pack could probably save some memory and
effort by avoiding them. I didn't look too closely, as it's all out of
scope for this series.

 fsck.c | 29 ++++++++++++++++++-----------
 fsck.h |  8 ++++++++
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fsck.c b/fsck.c
index 47eaeedd70..c2c8facd2d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1237,19 +1237,26 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 	if (!obj)
 		return report(options, NULL, OBJ_NONE, FSCK_MSG_BAD_OBJECT_SHA1, "no valid object to fsck");
 
-	if (obj->type == OBJ_BLOB)
-		return fsck_blob(&obj->oid, data, size, options);
-	if (obj->type == OBJ_TREE)
-		return fsck_tree(&obj->oid, data, size, options);
-	if (obj->type == OBJ_COMMIT)
-		return fsck_commit(&obj->oid, data, size, options);
-	if (obj->type == OBJ_TAG)
-		return fsck_tag(&obj->oid, data, size, options);
-
-	return report(options, &obj->oid, obj->type,
+	return fsck_buffer(&obj->oid, obj->type, data, size, options);
+}
+
+int fsck_buffer(const struct object_id *oid, enum object_type type,
+		void *data, unsigned long size,
+		struct fsck_options *options)
+{
+	if (type == OBJ_BLOB)
+		return fsck_blob(oid, data, size, options);
+	if (type == OBJ_TREE)
+		return fsck_tree(oid, data, size, options);
+	if (type == OBJ_COMMIT)
+		return fsck_commit(oid, data, size, options);
+	if (type == OBJ_TAG)
+		return fsck_tag(oid, data, size, options);
+
+	return report(options, oid, type,
 		      FSCK_MSG_UNKNOWN_TYPE,
 		      "unknown type '%d' (internal fsck error)",
-		      obj->type);
+		      type);
 }
 
 int fsck_error_function(struct fsck_options *o,
diff --git a/fsck.h b/fsck.h
index fcecf4101c..668330880e 100644
--- a/fsck.h
+++ b/fsck.h
@@ -183,6 +183,14 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
+/*
+ * Same as fsck_object(), but for when the caller doesn't have an object
+ * struct.
+ */
+int fsck_buffer(const struct object_id *oid, enum object_type,
+		void *data, unsigned long size,
+		struct fsck_options *options);
+
 /*
  * fsck a tag, and pass info about it back to the caller. This is
  * exposed fsck_object() internals for git-mktag(1).
-- 
2.39.1.616.gd06fca9e99


^ permalink raw reply related

* Re: [PATCH] worktree add: introduce basic DWYM for --orphan
From: Jacob Abel @ 2023-01-18 20:43 UTC (permalink / raw)
  To: phillip.wood
  Cc: git, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Junio C Hamano, Phillip Wood, Rubén Justo, Taylor Blau,
	rsbecker
In-Reply-To: <cddc6987-3b58-4688-65f8-3da0fbd1cc51@dunelm.org.uk>

On 23/01/16 10:52AM, Phillip Wood wrote:
> Hi Jacob
>
> On 14/01/2023 22:50, Jacob Abel wrote:
> > Introduces a DWYM shorthand of --orphan for when the worktree directory
> > and the to-be-created branch share the same name.
> >
> > Current Behavior:
> >      % git worktree list
> >      /path/to/git/repo        a38d39a4c5 [main]
> >      % git worktree add --orphan new_branch ../new_branch/
> >      Preparing worktree (new branch 'new_branch')
> >      % git worktree add --orphan ../new_branch2/
> >      usage: git worktree add [<options>] <path> [<commit-ish>]
> >         or: git worktree list [<options>]
> >      [...]
> >      %
> >
> > New Behavior:
> >
> >      % git worktree list
> >      /path/to/git/repo        a38d39a4c5 [main]
> >      % git worktree add --orphan new_branch ../new_branch/
> >      Preparing worktree (new branch 'new_branch')
> >      % git worktree list
> >      /path/to/git/repo        a38d39a4c5 [main]
> >      /path/to/git/new_branch  a38d39a4c5 [new_branch]
> >      % git worktree add --orphan ../new_branch2/
> >      Preparing worktree (new branch 'new_branch2')
> >      % git worktree list
> >      /path/to/git/repo        a38d39a4c5 [main]
> >      /path/to/git/new_branch  a38d39a4c5 [new_branch]
> >      /path/to/git/new_branch2 a38d39a4c5 [new_branch2]
> >      %
>
> Thanks for working on this. As I said in my previous mail I think it
> would be easier to use OPT_BOOL() for --orphan from the start. By using
> OPT_STRING() you'll run into problems with "git worktree add --orphan
> --lock <directory>"
>
> Best Wishes
>
> Phillip
>
> > [...]

Ah, good point. I missed that.

Also given the way the conversation is going, I'll drop this patch and integrate
the changes into the patches of the main series since I'll be re-rolling.


^ permalink raw reply

* [PATCH 4/6] t: use hash-object --literally when created malformed objects
From: Jeff King @ 2023-01-18 20:41 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

Many test scripts use hash-object to create malformed objects to see how
we handle the results in various commands. In some cases we already have
to use "hash-object --literally", because it does some rudimentary
quality checks. But let's use "--literally" more consistently to
future-proof these tests against hash-object learning to be more
careful.

Signed-off-by: Jeff King <peff@peff.net>
---
This patch is worth looking at because it shows the kinds of things the
new hash-object from patch 6 will reject.

Most of these are obviously terrible things that we'd want to complain
about, like broken emails, embedded NULs, and so on. The most
contentious one is probably a tag without a tagger line, which were
generated by early versions of Git (e.g., see Git's v0.99 tag). This is
an "info" in fsck (which is semantically like a warning, except
transfer.fsckObjects treats warnings as errors due to hysterical
raisins). But the hash-object change in patch 6 will reject it, because
it operates in strict mode.

That seems reasonable to me, since we're helping users avoid doing bad
things, and not dealing with existing objects.

 t/t1450-fsck.sh                 | 28 ++++++++++++++--------------
 t/t4054-diff-bogus-tree.sh      |  2 +-
 t/t4058-diff-duplicates.sh      |  2 +-
 t/t4212-log-corrupt.sh          |  4 ++--
 t/t5302-pack-index.sh           |  2 +-
 t/t5504-fetch-receive-strict.sh |  2 +-
 t/t5702-protocol-v2.sh          |  2 +-
 t/t6300-for-each-ref.sh         |  2 +-
 t/t7509-commit-authorship.sh    |  2 +-
 t/t7510-signed-commit.sh        |  2 +-
 t/t7528-signed-commit-ssh.sh    |  2 +-
 t/t8003-blame-corner-cases.sh   |  2 +-
 t/t9350-fast-export.sh          |  2 +-
 13 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index de0f6d5e7f..fdb886dfe4 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -212,7 +212,7 @@ test_expect_success 'email without @ is okay' '
 test_expect_success 'email with embedded > is not okay' '
 	git cat-file commit HEAD >basis &&
 	sed "s/@[a-z]/&>/" basis >bad-email &&
-	new=$(git hash-object -t commit -w --stdin <bad-email) &&
+	new=$(git hash-object --literally -t commit -w --stdin <bad-email) &&
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
@@ -223,7 +223,7 @@ test_expect_success 'email with embedded > is not okay' '
 test_expect_success 'missing < email delimiter is reported nicely' '
 	git cat-file commit HEAD >basis &&
 	sed "s/<//" basis >bad-email-2 &&
-	new=$(git hash-object -t commit -w --stdin <bad-email-2) &&
+	new=$(git hash-object --literally -t commit -w --stdin <bad-email-2) &&
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
@@ -234,7 +234,7 @@ test_expect_success 'missing < email delimiter is reported nicely' '
 test_expect_success 'missing email is reported nicely' '
 	git cat-file commit HEAD >basis &&
 	sed "s/[a-z]* <[^>]*>//" basis >bad-email-3 &&
-	new=$(git hash-object -t commit -w --stdin <bad-email-3) &&
+	new=$(git hash-object --literally -t commit -w --stdin <bad-email-3) &&
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
@@ -245,7 +245,7 @@ test_expect_success 'missing email is reported nicely' '
 test_expect_success '> in name is reported' '
 	git cat-file commit HEAD >basis &&
 	sed "s/ </> </" basis >bad-email-4 &&
-	new=$(git hash-object -t commit -w --stdin <bad-email-4) &&
+	new=$(git hash-object --literally -t commit -w --stdin <bad-email-4) &&
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
@@ -258,7 +258,7 @@ test_expect_success 'integer overflow in timestamps is reported' '
 	git cat-file commit HEAD >basis &&
 	sed "s/^\\(author .*>\\) [0-9]*/\\1 18446744073709551617/" \
 		<basis >bad-timestamp &&
-	new=$(git hash-object -t commit -w --stdin <bad-timestamp) &&
+	new=$(git hash-object --literally -t commit -w --stdin <bad-timestamp) &&
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
@@ -269,7 +269,7 @@ test_expect_success 'integer overflow in timestamps is reported' '
 test_expect_success 'commit with NUL in header' '
 	git cat-file commit HEAD >basis &&
 	sed "s/author ./author Q/" <basis | q_to_nul >commit-NUL-header &&
-	new=$(git hash-object -t commit -w --stdin <commit-NUL-header) &&
+	new=$(git hash-object --literally -t commit -w --stdin <commit-NUL-header) &&
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
@@ -292,7 +292,7 @@ test_expect_success 'tree object with duplicate entries' '
 			git cat-file tree $T &&
 			git cat-file tree $T
 		) |
-		git hash-object -w -t tree --stdin
+		git hash-object --literally -w -t tree --stdin
 	) &&
 	test_must_fail git fsck 2>out &&
 	test_i18ngrep "error in tree .*contains duplicate file entries" out
@@ -426,7 +426,7 @@ test_expect_success 'tag with incorrect tag name & missing tagger' '
 	This is an invalid tag.
 	EOF
 
-	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	tag=$(git hash-object --literally -t tag -w --stdin <wrong-tag) &&
 	test_when_finished "remove_object $tag" &&
 	echo $tag >.git/refs/tags/wrong &&
 	test_when_finished "git update-ref -d refs/tags/wrong" &&
@@ -558,7 +558,7 @@ test_expect_success 'rev-list --verify-objects with commit graph (parent)' '
 test_expect_success 'force fsck to ignore double author' '
 	git cat-file commit HEAD >basis &&
 	sed "s/^author .*/&,&/" <basis | tr , \\n >multiple-authors &&
-	new=$(git hash-object -t commit -w --stdin <multiple-authors) &&
+	new=$(git hash-object --literally -t commit -w --stdin <multiple-authors) &&
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
@@ -573,7 +573,7 @@ test_expect_success 'fsck notices blob entry pointing to null sha1' '
 	(git init null-blob &&
 	 cd null-blob &&
 	 sha=$(printf "100644 file$_bz$_bzoid" |
-	       git hash-object -w --stdin -t tree) &&
+	       git hash-object --literally -w --stdin -t tree) &&
 	  git fsck 2>out &&
 	  test_i18ngrep "warning.*null sha1" out
 	)
@@ -583,7 +583,7 @@ test_expect_success 'fsck notices submodule entry pointing to null sha1' '
 	(git init null-commit &&
 	 cd null-commit &&
 	 sha=$(printf "160000 submodule$_bz$_bzoid" |
-	       git hash-object -w --stdin -t tree) &&
+	       git hash-object --literally -w --stdin -t tree) &&
 	  git fsck 2>out &&
 	  test_i18ngrep "warning.*null sha1" out
 	)
@@ -648,7 +648,7 @@ test_expect_success 'NUL in commit' '
 		git commit --allow-empty -m "initial commitQNUL after message" &&
 		git cat-file commit HEAD >original &&
 		q_to_nul <original >munged &&
-		git hash-object -w -t commit --stdin <munged >name &&
+		git hash-object --literally -w -t commit --stdin <munged >name &&
 		git branch bad $(cat name) &&
 
 		test_must_fail git -c fsck.nulInCommit=error fsck 2>warn.1 &&
@@ -794,8 +794,8 @@ test_expect_success 'fsck errors in packed objects' '
 	git cat-file commit HEAD >basis &&
 	sed "s/</one/" basis >one &&
 	sed "s/</foo/" basis >two &&
-	one=$(git hash-object -t commit -w one) &&
-	two=$(git hash-object -t commit -w two) &&
+	one=$(git hash-object --literally -t commit -w one) &&
+	two=$(git hash-object --literally -t commit -w two) &&
 	pack=$(
 		{
 			echo $one &&
diff --git a/t/t4054-diff-bogus-tree.sh b/t/t4054-diff-bogus-tree.sh
index 294fb55313..05c88f8cdf 100755
--- a/t/t4054-diff-bogus-tree.sh
+++ b/t/t4054-diff-bogus-tree.sh
@@ -10,7 +10,7 @@ test_expect_success 'create bogus tree' '
 	bogus_tree=$(
 		printf "100644 fooQ$name" |
 		q_to_nul |
-		git hash-object -w --stdin -t tree
+		git hash-object --literally -w --stdin -t tree
 	)
 '
 
diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh
index 54614b814d..2501c89c1c 100755
--- a/t/t4058-diff-duplicates.sh
+++ b/t/t4058-diff-duplicates.sh
@@ -29,7 +29,7 @@ make_tree () {
 		make_tree_entry "$1" "$2" "$3"
 		shift; shift; shift
 	done |
-	git hash-object -w -t tree --stdin
+	git hash-object --literally -w -t tree --stdin
 }
 
 # this is kind of a convoluted setup, but matches
diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 30a219894b..e89e1f54b6 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -10,7 +10,7 @@ test_expect_success 'setup' '
 
 	git cat-file commit HEAD |
 	sed "/^author /s/>/>-<>/" >broken_email.commit &&
-	git hash-object -w -t commit broken_email.commit >broken_email.hash &&
+	git hash-object --literally -w -t commit broken_email.commit >broken_email.hash &&
 	git update-ref refs/heads/broken_email $(cat broken_email.hash)
 '
 
@@ -46,7 +46,7 @@ test_expect_success 'git log --format with broken author email' '
 munge_author_date () {
 	git cat-file commit "$1" >commit.orig &&
 	sed "s/^\(author .*>\) [0-9]*/\1 $2/" <commit.orig >commit.munge &&
-	git hash-object -w -t commit commit.munge
+	git hash-object --literally -w -t commit commit.munge
 }
 
 test_expect_success 'unparsable dates produce sentinel value' '
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index b0095ab41d..59e9e77223 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -263,7 +263,7 @@ tag guten tag
 This is an invalid tag.
 EOF
 
-	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	tag=$(git hash-object -t tag -w --stdin --literally <wrong-tag) &&
 	pack1=$(echo $tag $sha | git pack-objects tag-test) &&
 	echo remove tag object &&
 	thirtyeight=${tag#??} &&
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index ac4099ca89..88d3c56750 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -138,7 +138,7 @@ This commit object intentionally broken
 EOF
 
 test_expect_success 'setup bogus commit' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)"
+	commit="$(git hash-object --literally -t commit -w --stdin <bogus-commit)"
 '
 
 test_expect_success 'fsck with no skipList input' '
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index b33cd4afca..e4db7513f4 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -1114,7 +1114,7 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object'
 
 	This commit object intentionally broken
 	EOF
-	BOGUS=$(git -C "$P" hash-object -t commit -w --stdin <bogus-commit) &&
+	BOGUS=$(git -C "$P" hash-object -t commit -w --stdin --literally <bogus-commit) &&
 	git -C "$P" branch bogus-branch "$BOGUS" &&
 
 	echo my-blob >"$P/my-blob" &&
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 2ae1fc721b..c466fd989f 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -606,7 +606,7 @@ test_expect_success 'create tag without tagger' '
 	git tag -a -m "Broken tag" taggerless &&
 	git tag -f taggerless $(git cat-file tag taggerless |
 		sed -e "/^tagger /d" |
-		git hash-object --stdin -w -t tag)
+		git hash-object --literally --stdin -w -t tag)
 '
 
 test_atom refs/tags/taggerless type 'commit'
diff --git a/t/t7509-commit-authorship.sh b/t/t7509-commit-authorship.sh
index 21c668f75e..5d890949f7 100755
--- a/t/t7509-commit-authorship.sh
+++ b/t/t7509-commit-authorship.sh
@@ -105,7 +105,7 @@ test_expect_success '--amend option with empty author' '
 test_expect_success '--amend option with missing author' '
 	git cat-file commit Initial >tmp &&
 	sed "s/author [^<]* </author </" tmp >malformed &&
-	sha=$(git hash-object -t commit -w malformed) &&
+	sha=$(git hash-object --literally -t commit -w malformed) &&
 	test_when_finished "remove_object $sha" &&
 	git checkout $sha &&
 	test_when_finished "git checkout Initial" &&
diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 8593b7e3cb..bc7a31ba3e 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -202,7 +202,7 @@ test_expect_success GPG 'detect fudged signature with NUL' '
 	git cat-file commit seventh-signed >raw &&
 	cat raw >forged2 &&
 	echo Qwik | tr "Q" "\000" >>forged2 &&
-	git hash-object -w -t commit forged2 >forged2.commit &&
+	git hash-object --literally -w -t commit forged2 >forged2.commit &&
 	test_must_fail git verify-commit $(cat forged2.commit) &&
 	git show --pretty=short --show-signature $(cat forged2.commit) >actual2 &&
 	grep "BAD signature from" actual2 &&
diff --git a/t/t7528-signed-commit-ssh.sh b/t/t7528-signed-commit-ssh.sh
index f47e995179..065f780636 100755
--- a/t/t7528-signed-commit-ssh.sh
+++ b/t/t7528-signed-commit-ssh.sh
@@ -270,7 +270,7 @@ test_expect_success GPGSSH 'detect fudged signature with NUL' '
 	git cat-file commit seventh-signed >raw &&
 	cat raw >forged2 &&
 	echo Qwik | tr "Q" "\000" >>forged2 &&
-	git hash-object -w -t commit forged2 >forged2.commit &&
+	git hash-object --literally -w -t commit forged2 >forged2.commit &&
 	test_must_fail git verify-commit $(cat forged2.commit) &&
 	git show --pretty=short --show-signature $(cat forged2.commit) >actual2 &&
 	grep "${GPGSSH_BAD_SIGNATURE}" actual2 &&
diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index d751d48b7d..8bcd39e81b 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -201,7 +201,7 @@ committer David Reiss <dreiss@facebook.com> 1234567890 +0000
 
 some message
 EOF
-  COMMIT=$(git hash-object -t commit -w badcommit) &&
+  COMMIT=$(git hash-object --literally -t commit -w badcommit) &&
   git --no-pager blame $COMMIT -- uno >/dev/null
 '
 
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index ff21a12ee6..26c25c0eb2 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -373,7 +373,7 @@ EOF
 
 test_expect_success 'cope with tagger-less tags' '
 
-	TAG=$(git hash-object -t tag -w tag-content) &&
+	TAG=$(git hash-object --literally -t tag -w tag-content) &&
 	git update-ref refs/tags/sonnenschein $TAG &&
 	git fast-export -C -C --signed-tags=strip --all > output &&
 	test $(grep -c "^tag " output) = 4 &&
-- 
2.39.1.616.gd06fca9e99


^ permalink raw reply related

* [PATCH 3/6] t7030: stop using invalid tag name
From: Jeff King @ 2023-01-18 20:36 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

We intentionally invalidate the signature of a tag by switching its tag
name from "seventh" to "7th forged". However, the latter is not a valid
tag name because it contains a space. This doesn't currently affect the
test, but we're better off using something syntactically valid. That
reduces the number of possible failure modes in the test, and
future-proofs us if git hash-object gets more picky about its input.

The t7031 script, which was mostly copied from t7030, has the same
problem, so we'll fix it, too.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7030-verify-tag.sh            | 2 +-
 t/t7031-verify-tag-signed-ssh.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7030-verify-tag.sh b/t/t7030-verify-tag.sh
index 10faa64515..6f526c37c2 100755
--- a/t/t7030-verify-tag.sh
+++ b/t/t7030-verify-tag.sh
@@ -115,7 +115,7 @@ test_expect_success GPGSM 'verify and show signatures x509 with high minTrustLev
 
 test_expect_success GPG 'detect fudged signature' '
 	git cat-file tag seventh-signed >raw &&
-	sed -e "/^tag / s/seventh/7th forged/" raw >forged1 &&
+	sed -e "/^tag / s/seventh/7th-forged/" raw >forged1 &&
 	git hash-object -w -t tag forged1 >forged1.tag &&
 	test_must_fail git verify-tag $(cat forged1.tag) 2>actual1 &&
 	grep "BAD signature from" actual1 &&
diff --git a/t/t7031-verify-tag-signed-ssh.sh b/t/t7031-verify-tag-signed-ssh.sh
index 1cb36b9ab8..36eb86a4b1 100755
--- a/t/t7031-verify-tag-signed-ssh.sh
+++ b/t/t7031-verify-tag-signed-ssh.sh
@@ -125,7 +125,7 @@ test_expect_success GPGSSH,GPGSSH_VERIFYTIME 'verify-tag failes with tag date ou
 test_expect_success GPGSSH 'detect fudged ssh signature' '
 	test_config gpg.ssh.allowedSignersFile "${GPGSSH_ALLOWED_SIGNERS}" &&
 	git cat-file tag seventh-signed >raw &&
-	sed -e "/^tag / s/seventh/7th forged/" raw >forged1 &&
+	sed -e "/^tag / s/seventh/7th-forged/" raw >forged1 &&
 	git hash-object -w -t tag forged1 >forged1.tag &&
 	test_must_fail git verify-tag $(cat forged1.tag) 2>actual1 &&
 	grep "${GPGSSH_BAD_SIGNATURE}" actual1 &&
-- 
2.39.1.616.gd06fca9e99


^ permalink raw reply related

* [PATCH 2/6] t1006: stop using 0-padded timestamps
From: Jeff King @ 2023-01-18 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

The fake objects in t1006 use dummy timestamps like "0000000000 +0000".
While this does make them look more like normal timestamps (which,
unless it is 1970, have many digits), it actually violates our fsck
checks, which complain about zero-padded timestamps.

This doesn't currently break anything, but let's future-proof our tests
against a version of hash-object which is a little more careful about
its input. We don't actually care about the exact values here (and in
fact, the helper functions in this script end up removing the timestamps
anyway, so we don't even have to adjust other parts of the tests).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1006-cat-file.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 23b8942edb..2d875b17d8 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -292,8 +292,8 @@ commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
 commit_size=$(($(test_oid hexsz) + 137))
 commit_content="tree $tree_sha1
-author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0000000000 +0000
-committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0000000000 +0000
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0 +0000
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0 +0000
 
 $commit_message"
 
@@ -304,7 +304,7 @@ type blob
 tag hellotag
 tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
 tag_description="This is a tag"
-tag_content="$tag_header_without_timestamp 0000000000 +0000
+tag_content="$tag_header_without_timestamp 0 +0000
 
 $tag_description"
 
-- 
2.39.1.616.gd06fca9e99


^ permalink raw reply related

* [PATCH 1/6] t1007: modernize malformed object tests
From: Jeff King @ 2023-01-18 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>

The tests in t1007 for detecting malformed objects have two
anachronisms:

 - they use "sha1" instead of "oid" in variable names, even though the
   script as a whole has been adapted to handle sha256

 - they use test_i18ngrep, which is no longer necessary

Since we'll be adding a new similar test, let's clean these up so they
are all consistently using the modern style.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1007-hash-object.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index ac5ad8c740..2d2148d8fa 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -203,23 +203,23 @@ done
 test_expect_success 'too-short tree' '
 	echo abc >malformed-tree &&
 	test_must_fail git hash-object -t tree malformed-tree 2>err &&
-	test_i18ngrep "too-short tree object" err
+	grep "too-short tree object" err
 '
 
 test_expect_success 'malformed mode in tree' '
-	hex_sha1=$(echo foo | git hash-object --stdin -w) &&
-	bin_sha1=$(echo $hex_sha1 | hex2oct) &&
-	printf "9100644 \0$bin_sha1" >tree-with-malformed-mode &&
+	hex_oid=$(echo foo | git hash-object --stdin -w) &&
+	bin_oid=$(echo $hex_oid | hex2oct) &&
+	printf "9100644 \0$bin_oid" >tree-with-malformed-mode &&
 	test_must_fail git hash-object -t tree tree-with-malformed-mode 2>err &&
-	test_i18ngrep "malformed mode in tree entry" err
+	grep "malformed mode in tree entry" err
 '
 
 test_expect_success 'empty filename in tree' '
-	hex_sha1=$(echo foo | git hash-object --stdin -w) &&
-	bin_sha1=$(echo $hex_sha1 | hex2oct) &&
-	printf "100644 \0$bin_sha1" >tree-with-empty-filename &&
+	hex_oid=$(echo foo | git hash-object --stdin -w) &&
+	bin_oid=$(echo $hex_oid | hex2oct) &&
+	printf "100644 \0$bin_oid" >tree-with-empty-filename &&
 	test_must_fail git hash-object -t tree tree-with-empty-filename 2>err &&
-	test_i18ngrep "empty filename in tree entry" err
+	grep "empty filename in tree entry" err
 '
 
 test_expect_success 'corrupt commit' '
-- 
2.39.1.616.gd06fca9e99


^ permalink raw reply related

* [RFC/PATCH 0/6] hash-object: use fsck to check objects
From: Jeff King @ 2023-01-18 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Ævar Arnfjörð Bjarmason

Right now "git hash-object" will do some basic sanity checks of the
input using the usual parser code. This series teaches it to use the
fsck code instead, which should catch more things. See patch 6 for some
discussion of the implications.

The reason this is marked as an RFC is that at the end, compiling with
SANITIZE=address will provoke a failure in t3800. The issue is that
fsck_tag_standalone(), when fed a buffer/size combo, will look for a NUL
at the end of the headers, which might be buffer[size]. This is usually
OK for objects we've loaded from the odb, because we intentionally stick
an extra NUL at the end for safety. But here index_mem() may get an
arbitrary buffer.

I'm not sure yet of the right path forward. It's not too hard to add an
extra NUL in most cases, but one code path will mmap a file on disk. And
sticking a NUL there is hard (we already went down that road trying to
avoid REG_STARTEND for grep, and there wasn't a good solution).

The other option is having the fsck code avoid looking past the size it
was given. I think the intent is that this should work, from commits
like 4d0d89755e (Make sure fsck_commit_buffer() does not run out of the
buffer, 2014-09-11). We do use skip_prefix() and parse_oid_hex(), which
won't respect the size, but I think[1] that's OK because we'll have
parsed up to the end-of-header beforehand (and those functions would
never match past there).

Which would mean that 9a1a3a4d4c (mktag: allow omitting the header/body
\n separator, 2021-01-05) and acf9de4c94 (mktag: use fsck instead of
custom verify_tag(), 2021-01-05) were buggy, and we can just fix them.

[1] But I said "I think" above because it can get pretty subtle. There's
    some more discussion in this thread:

      https://lore.kernel.org/git/20150625155128.C3E9738005C@gemini.denx.de/

    but I haven't yet convinced myself it's safe. This is exactly the
    kind of analysis I wish I had the power to nerd-snipe René into.

Anyway, here are the patches in the meantime. I do think this is a good
direction overall, modulo addressing the NUL-terminator question.

  [1/6]: t1007: modernize malformed object tests
  [2/6]: t1006: stop using 0-padded timestamps
  [3/6]: t7030: stop using invalid tag name
  [4/6]: t: use hash-object --literally when created malformed objects
  [5/6]: fsck: provide a function to fsck buffer without object struct
  [6/6]: hash-object: use fsck for object checks

 fsck.c                           | 29 ++++++++++-------
 fsck.h                           |  8 +++++
 object-file.c                    | 55 +++++++++++++-------------------
 t/t1006-cat-file.sh              |  6 ++--
 t/t1007-hash-object.sh           | 29 +++++++++++------
 t/t1450-fsck.sh                  | 28 ++++++++--------
 t/t4054-diff-bogus-tree.sh       |  2 +-
 t/t4058-diff-duplicates.sh       |  2 +-
 t/t4212-log-corrupt.sh           |  4 +--
 t/t5302-pack-index.sh            |  2 +-
 t/t5504-fetch-receive-strict.sh  |  2 +-
 t/t5702-protocol-v2.sh           |  2 +-
 t/t6300-for-each-ref.sh          |  2 +-
 t/t7030-verify-tag.sh            |  2 +-
 t/t7031-verify-tag-signed-ssh.sh |  2 +-
 t/t7509-commit-authorship.sh     |  2 +-
 t/t7510-signed-commit.sh         |  2 +-
 t/t7528-signed-commit-ssh.sh     |  2 +-
 t/t8003-blame-corner-cases.sh    |  2 +-
 t/t9350-fast-export.sh           |  2 +-
 20 files changed, 101 insertions(+), 84 deletions(-)

-Peff

^ permalink raw reply

* Re: [PATCH v4 15/19] object-file.c: release the "tag" in check_tag()
From: Jeff King @ 2023-01-18 19:17 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Eric Sunshine
In-Reply-To: <d30e2fb2-e9dc-20e0-0761-3b585a053586@web.de>

On Wed, Jan 18, 2023 at 07:54:41PM +0100, René Scharfe wrote:

> > Yes, but it does so with lookup_commit(), so the resulting commit
> > objects are themselves reachable from the usual obj_hash, and thus not
> > leaked.
> 
> The commit_list structures are leaked, no?

Ah, yeah, you're right.

-Peff

^ permalink raw reply

* Re: [PATCH v4 15/19] object-file.c: release the "tag" in check_tag()
From: René Scharfe @ 2023-01-18 18:54 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Eric Sunshine
In-Reply-To: <Y8g84mABtIiHmxTI@coredump.intra.peff.net>

Am 18.01.23 um 19:39 schrieb Jeff King:
> On Wed, Jan 18, 2023 at 07:05:32PM +0100, René Scharfe wrote:
>
>>>   2. The point of this code is to find malformed input to hash-object.
>>>      We're probably better off feeding the buffer to fsck_commit(), etc.
>>>      It does more thorough checks, and these days it does not need an
>>>      object struct at all.
>>
>> I like the second one, as long as it won't check too much.  c879daa237
>> (Make hash-object more robust against malformed objects, 2011-02-05) added
>> the checks that are now in object-file.c and intended to only validate the
>> internal structure of objects, not relations between.  It gave the example
>> to allow adding a commit before its tree, which should be allowed.  And
>> IIUC fsck_object() fits that bill.
>
> Yes, I think it will do what the right thing here. And having just
> written up a quick series, the only tests which needed changes were ones
> with syntactic problems. :)
>
> I'll send it out in a few minutes.

Great! :)

>>> Either of which would naturally fix the leak for tags. I'm not sure
>>> there actually is a leak for commits, as commit structs don't store any
>>> strings themselves.
>>
>> parse_commit_buffer() allocates the list of parents.
>
> Yes, but it does so with lookup_commit(), so the resulting commit
> objects are themselves reachable from the usual obj_hash, and thus not
> leaked.

The commit_list structures are leaked, no?

>
>> Hmm, and it looks them up.  Doesn't this violate the goal to allow
>> dangling references?
>
> No, because lookup_commit() is just about creating an in-process struct.
> It doesn't look at the object database at all (though it would complain
> if we had seen the same oid in-process as another type).

Ah, good.

René

^ permalink raw reply

* Re: [PATCH] git: replace strbuf_addstr with strbuf_addch for all strings of length 2
From: Eric Sunshine @ 2023-01-18 18:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rose via GitGitGadget, git, Seija Kijin
In-Reply-To: <xmqqr0vr6d80.fsf@gitster.g>

On Wed, Jan 18, 2023 at 11:14 AM Junio C Hamano <gitster@pobox.com> wrote:
> > From: Seija Kijin <doremylover123@gmail.com>
> > This helps reduce overhead of calculating the length
>
> > diff --git a/builtin/am.c b/builtin/am.c
> >       strbuf_addstr(&sb, "GIT_AUTHOR_NAME=");
> >       sq_quote_buf(&sb, state->author_name);
> > -     strbuf_addch(&sb, '\n');
> >
> > -     strbuf_addstr(&sb, "GIT_AUTHOR_EMAIL=");
> > +     strbuf_addstr(&sb, "\nGIT_AUTHOR_EMAIL=");
>
> This may reduce the number of lines, but markedly worsens the
> readability of the resulting code.  Each of the three-line blocks in
> the original used to be logically complete and independent unit, but
> now each of them depend on what the last block wants.

Very much agree with this and all your other review comments.

> > -             strbuf_addchars(dest, ' ', 2);
> > -             strbuf_addstr(dest, "From inner merge:");
> > +             strbuf_addstr(dest, "  From inner merge:");
> >               strbuf_addchars(dest, ' ', opt->priv->call_depth * 2);
>
> Ditto, even though this is not as horrible as the change to builtin/am.c
> we saw earlier.

Additionally, if this literal string ever gets wrapped in `_(...)`,
then the above change is even more undesirable due to the extra burden
it places on translators.

^ permalink raw reply

* Re: [PATCH v4 15/19] object-file.c: release the "tag" in check_tag()
From: Jeff King @ 2023-01-18 18:39 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Eric Sunshine
In-Reply-To: <fd883d86-0c85-6c72-a331-2e8b2064befe@web.de>

On Wed, Jan 18, 2023 at 07:05:32PM +0100, René Scharfe wrote:

> >   2. The point of this code is to find malformed input to hash-object.
> >      We're probably better off feeding the buffer to fsck_commit(), etc.
> >      It does more thorough checks, and these days it does not need an
> >      object struct at all.
> 
> I like the second one, as long as it won't check too much.  c879daa237
> (Make hash-object more robust against malformed objects, 2011-02-05) added
> the checks that are now in object-file.c and intended to only validate the
> internal structure of objects, not relations between.  It gave the example
> to allow adding a commit before its tree, which should be allowed.  And
> IIUC fsck_object() fits that bill.

Yes, I think it will do what the right thing here. And having just
written up a quick series, the only tests which needed changes were ones
with syntactic problems. :)

I'll send it out in a few minutes.

> > Either of which would naturally fix the leak for tags. I'm not sure
> > there actually is a leak for commits, as commit structs don't store any
> > strings themselves.
> 
> parse_commit_buffer() allocates the list of parents.

Yes, but it does so with lookup_commit(), so the resulting commit
objects are themselves reachable from the usual obj_hash, and thus not
leaked.

> Hmm, and it looks them up.  Doesn't this violate the goal to allow
> dangling references?

No, because lookup_commit() is just about creating an in-process struct.
It doesn't look at the object database at all (though it would complain
if we had seen the same oid in-process as another type).

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox