Git development

Git development
 help / color / mirror / Atom feed

* [PATCH] diagnose: require repository
From: Martin Ågren @ 2023-10-14 13:53 UTC (permalink / raw)
  To: ks1322 ks1322; +Cc: git, Victoria Dye
In-Reply-To: <CAKFQ_Q9WjF9i-Rx2jdCw-adPVQrWNfNKrDY-em8Rpa5RNLXz4A@mail.gmail.com>

When `git diagnose` is run from outside a repo, it begins collecting
various information before eventually hitting a segmentation fault,
leaving an incomplete zip file behind.

Switch from the gentle setup to requiring a git directory. Without a git
repo, there isn't really much to diagnose.

We could possibly do a best-effort collection of information about the
machine and then give up. That would roughly be today's behavior but
with a controlled exit rather than a segfault. However, the purpose of
this tool is largely to create a zip archive. Rather than creating an
empty zip file or no zip file at all, and having to explain that
behavior, it seems more helpful to bail out clearly and early with a
succinct error message.

Reported-by: ks1322 ks1322 <ks1322@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Thanks for the report. This could be one way of fixing this.

 I haven't found anything in the original submission [1] discussing this
 "_GENTLY". I didn't see anything in the implementation or the tests
 suggesting that it was intentional to run outside a git repo.

 [1] https://lore.kernel.org/git/xmqqzgg1nz6v.fsf@gitster.g/t/#mc66904caab6bc79e57eaf5063df268b2725b6fcc

 t/t0092-diagnose.sh | 5 +++++
 git.c               | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/t/t0092-diagnose.sh b/t/t0092-diagnose.sh
index 133e5747d6..49671d35a2 100755
--- a/t/t0092-diagnose.sh
+++ b/t/t0092-diagnose.sh
@@ -5,6 +5,11 @@ test_description='git diagnose'
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh

+test_expect_success 'nothing to diagnose without repo' '
+	nongit test_must_fail git diagnose 2>err &&
+	grep "not a git repository" err
+'
+
 test_expect_success UNZIP 'creates diagnostics zip archive' '
 	test_when_finished rm -rf report &&

diff --git a/git.c b/git.c
index c67e44dd82..ff04a74bbd 100644
--- a/git.c
+++ b/git.c
@@ -525,7 +525,7 @@ static struct cmd_struct commands[] = {
 	{ "credential-cache--daemon", cmd_credential_cache_daemon },
 	{ "credential-store", cmd_credential_store },
 	{ "describe", cmd_describe, RUN_SETUP },
-	{ "diagnose", cmd_diagnose, RUN_SETUP_GENTLY },
+	{ "diagnose", cmd_diagnose, RUN_SETUP },
 	{ "diff", cmd_diff, NO_PARSEOPT },
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
-- 
2.42.0.399.g85a82e71e0

^ permalink raw reply related

* Bug: git diagnose crashes with Segmentation fault outside of git repository
From: ks1322 ks1322 @ 2023-10-14 11:59 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
Run `git diagnose` outside of any git repository

What did you expect to happen? (Expected behavior)
No crash, reasonable diagnose output

What happened instead? (Actual behavior)
`git diagnose` crashed with Segmentation fault

$ git diagnose
Collecting diagnostic info

git version 2.41.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
Repository root: (null)
Available space on '/tmp': 7.75 GiB (mount flags 0x6)
Segmentation fault (core dumped)

What's different between what you expected and what actually happened?
Segmentation fault, truncated output

Anything else you want to add:
`git diagnose` does not crash when run within some git repository

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.

[System Info]
git version:
git version 2.41.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 6.5.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct  6
19:02:35 UTC 2023 x86_64
compiler info: gnuc: 13.1
libc info: glibc: 2.37
$SHELL (typically, interactive shell): /bin/bash

[Enabled Hooks]
not run from a git repository - no hooks to show

^ permalink raw reply

* Re: [PATCH] bugreport: add 'seconds' to default outfile name
From: Kristoffer Haugsbakk @ 2023-10-14 10:35 UTC (permalink / raw)
  To: Jacob Stopak; +Cc: git
In-Reply-To: <20231014040101.8333-1-jacob@initialcommit.io>

Hi Jacob

On Sat, Oct 14, 2023, at 06:01, Jacob Stopak wrote:
> This patch adds the calendar second value to the default bugreport

Nitpick: you can just say “Add the calendar”. “This patch” is redundant.

See `Documentation/SubmittingPatches` at `imperative-mood`.

-- 
Kristoffer

^ permalink raw reply

* Re: [RFC] Define "precious" attribute and support it in `git clean`
From: Sebastian Thiel @ 2023-10-14  7:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Josh Triplett, Kristoffer Haugsbakk
In-Reply-To: <xmqqfs2e3292.fsf@gitster.g>

On 13 Oct 2023, at 18:39, Junio C Hamano wrote:

> Come to think of it, we might be able to retrofit '!' without too
> much damage.  Something like "!unignored" is now a deprecated but
> still supported way to say "!!unignored", "!*precious" is new, and
> "\!anything" is a pathname that begins with '!'.

I don't know anything about statistics, and I don't which of the proposed
syntax thus far has the lowest probability of accidental breakage, possibly
in combination with the best possible usability.

However, I do like even more the idea to retro-fit `!` instead of having an
entirely new prefix, it seems more intuitive to me.

An apparent disadvantage would be that using `!` prefix with
backwards-compatibility will make any additional future modifier more
breaking. For instance `!*` is potentially ignoring an additional file
in old git, and another `!-` modifier is having the same effect.

Chances for this are probably low though, but if in doubt it would be possible
to check certain patterns against all files of the top-3.5TB of
GitHub repositories.

Using `!*` to signal precious files also seems like a less likely
path prefix than `!$` would be, but then again, it's just a guess
which most definitely doesn't have much bearing.

I personally also like this more than using special comments as 'modifier',
even though doing so would probably have the lowest probability for
accidentally ignoring files in old git.

Maybe it's time to choose one of the options with the possibility to validate
it for accidental exclusion of files against the top 3.5TB of
GitHub repositories to be more sure it?

^ permalink raw reply

* Re: [RFC] Define "precious" attribute and support it in `git clean`
From: Josh Triplett @ 2023-10-14  5:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sebastian Thiel, git
In-Reply-To: <xmqqttqytnqb.fsf@gitster.g>

On Tue, Oct 10, 2023 at 10:02:20AM -0700, Junio C Hamano wrote:
> Sebastian Thiel <sebastian.thiel@icloud.com> writes:
> 
> > I'd like to propose adding a new standard gitattribute "precious".
> 
> ;-).
> 
> Over the years, I've seen many times scenarios that would have been
> helped if we had not just "tracked? ignored? unignored?" but also
> the fourth kind [*].  The word "ignored" (or "excluded") has always
> meant "not tracked, not to be tracked, and expendable" to Git, and
> "ignored but unexpendable" class was missing.  I even used the term
> "precious" myself in those discussions.  At the concept level, I
> support the effort 100%, but as always, the devil will be in the
> details.

"I've already wanted this for years" is, honestly, the best response we
could *possibly* have hoped for.

> Scenarios that people wished for "precious" traditionally have been
> 
>  * You are working on 'master'.  You have in your .gitignore or
>    .git/info/exclude a line to ignore path A, and have random
>    scribbles in a throw-away file there.  There is another branch
>    'seen', where they added some tracked contents at path A/B.  You
>    do "git checkout seen" and your file A that is an expendable file,
>    because it is listed as ignored in .git/info/exclude, is removed
>    to make room for creating A/B.

Ouch, I hadn't even thought about the issue of branch-switching
overwriting a file like that, but that's another great reason to have
"precious". (I've been thinking about "precious" as primarily to protect
files like `.config`, where they'd be unlikely to be checked in on any
branch because they have an established purpose in the project. Though,
of course, people *do* sometimes check in `.config` files in
special-purpose branches that aren't meant for upstreaming.)

>  * Similar situation, but this time, 'seen' branch added a tracked
>    contents at path A.  Again, "git checkout seen" will discard the
>    expendable file A and replace it with tracked contents.
> 
>  * Instead of "git checkout", you decide to merge the branch 'seen'
>    to the checkout of 'master', where you have an ignored path A.
>    Because merging 'seen' would need to bring the tracked contents
>    of either A/B (in the first scenario above) or A (in the second
>    scenario), your "expendable" A will be removed to make room.

+1

> In previous discussions, nobody was disturbed that "git clean" was
> unaware of the "precious" class, but if we were to have the
> "precious" class in addition to "ignored" aka "expendable", I would
> not oppose to teach "git clean" about it, too.
> 
> There was an early and rough design draft there in
> 
> https://lore.kernel.org/git/7vipsnar23.fsf@alter.siamese.dyndns.org/
> 
> which probably is worth a read, too.
> 
> Even though I referred to the precious _attribute_ in some of these
> discussions, between the attribute mechanism and the ignore
> mechanism, I am actually leaning toward suggesting to extend the
> exclude/ignore mechanism to introduce the "precious" class.  That
> way, we can avoid possible snafu arising from marking a path in
> .gitignore as ignored, and in .gitattrbutes as precious, and have to
> figure out how these two settings are to work together.

Sounds reasonable.

> In any case, the "precious" paths are expected to be small minority
> of what people never want to "git add" or "git commit", so coming up
> with a special syntax to be used in .gitignore, even if that special
> syntax is ugly and cumbersome to type, would be perfectly OK.

[Following up both to this and to Sebastian's response.]

One potentially important question: should the behavior of old git be to
treat precious files as ignored, or as not-ignored? If the syntax were
something like

$.config

then old git would treat the file as not-ignored. If the syntax were
something like

$precious
.config

then old git would treat the file as ignored.

Seems like it would be obtrusive if `git status` in old git showed the
file, and `git add .` in old git added it.

^ permalink raw reply

* [PATCH] bugreport: add 'seconds' to default outfile name
From: Jacob Stopak @ 2023-10-14  4:01 UTC (permalink / raw)
  To: git; +Cc: Jacob Stopak

Currently, git bugreport postfixes the default bugreport filename (and
diagnostics zip filename if --diagnose is supplied) with the current
calendar hour and minute values, assuming the -s flag is absent.

If a user runs the bugreport command more than once within a calendar
minute, a filename conflict with an existing file occurs and the program
errors, since the new output filename was already used for the previous
file. If the user waits anywhere from 1 to 60 seconds (depending on
_when_ during the calendar minute the first command was run) the command
works again with no error since the default filename is now unique, and
multiple bug reports are able to be created with default settings.

This is a minor thing but can cause confusion especially for first time
users of the bugreport command, who are likely to run it multiple times
in quick succession to learn how it works, (like I did).

This patch adds the calendar second value to the default bugreport
filename. This technically just shortens the window during which the
issue occurs, but a single second is a small enough time increment that
users will avoid the filename conflict in practice in this scenario.

This means the user will end up with multiple bugreport files being
created if they run the command multiple times quickly, but that feels
more intuitive and consistent than an error arbitrarily occuring within
a calendar minute, especially given that the time window in which the
error currently occurs is variable as described above.

Signed-off-by: Jacob Stopak <jacob@initialcommit.io>
---
 builtin/bugreport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index d2ae5c305d..b556c6e135 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -106,7 +106,7 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	struct tm tm;
 	enum diagnose_mode diagnose = DIAGNOSE_NONE;
 	char *option_output = NULL;
-	char *option_suffix = "%Y-%m-%d-%H%M";
+	char *option_suffix = "%Y-%m-%d-%H%M%S";
 	const char *user_relative_path = NULL;
 	char *prefixed_filename;
 	size_t output_path_len;

base-commit: 493f4622739e9b64f24b465b21aa85870dd9dc09
-- 
2.42.0.297.g393e7d1581

^ permalink raw reply related

* [PATCH 21/20] t5319: make corrupted large-offset test more robust
From: Jeff King @ 2023-10-14  0:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Taylor Blau
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>

On Mon, Oct 09, 2023 at 04:55:44PM -0400, Jeff King wrote:

>   [10/20]: midx: bounds-check large offset chunk

Here's one more patch on top that fixes a flaky test. Sorry not to have
caught it before sending out the original series. I did see a flaky CI
run before then, but I tweaked the test with what I thought was the
solution, and then it didn't re-occur until I ran CI again today.

I'm pretty sure this should fix it for good.

(I also saw Taylor posted some further patches; I haven't looked closely
yet, this should be orthogonal to those).

-- >8 --
Subject: [PATCH] t5319: make corrupted large-offset test more robust

The test t5319.88 ("reader bounds-checks large offset table") can fail
intermittently. The failure mode looks like this:

  1. An earlier test sets up "objects64", a directory that can be used
     to produce a midx with a corrupted large-offsets table. To get the
     large offsets, it corrupts the normal ".idx" file to have a fake
     large offset, and then builds a midx from that.

     That midx now has a large offset table, which is what we want. But
     we also have a .idx on disk that has a corrupted entry. We'll call
     the object with the corrupted large-offset "X".

  2. In t5319.88, we further corrupt the midx by reducing the size of
     the large-offset chunk (because our goal is to make sure we do not
     do an out-of-bounds read on it).

  3. We then enumerate all of the objects with "cat-file --batch-check
     --batch-all-objects", expecting to see a complaint when we try to
     show object X. We use --batch-all-objects because our objects64
     repo doesn't actually have any refs (but if we check them all, one
     of them will be the failing one). The default batch-check format
     includes %(objecttype) and %(objectsize), both of which require us
     to access the actual pack data (and thus requires looking at the
     offset).

  4a. Usually, this succeeds. We try to output object X, do a lookup via
      the midx for the type/size lookup, and run into the corrupt
      large-offset table.

  4b. But sometimes we hit a different error. If another object points
      to X as a delta base, then trying to find the type of that object
      requires walking the delta chain to the base entry (since only the
      base has the concrete type; deltas themselves are either OFS_DELTA
      or REF_DELTA).

      Normally this would not require separate offset lookups at all, as
      deltas are usually stored as OFS_DELTA, specifying the relative
      offset to the base. But the corrupt idx created in step 1 is done
      directly with "git pack-objects" and does not pass the
      --delta-base-offset option, meaning we have REF_DELTA entries!
      Those do have to consult an index to find the location of the base
      object, and they use the pack .idx to do this. The same pack .idx
      that we know is corrupted from step 1!

      Git does notice the error, but it does so by seeing the corrupt
      .idx file, not the corrupt midx file, and the error it reports is
      different, causing the test to fail.

The set of objects created in the test is deterministic. But the delta
selection seems not to be (which is not too surprising, as it is
multi-threaded). I have seen the failure in Windows CI but haven't
reproduced it locally (not even with --stress). Re-running a failed
Windows CI job tends to work. But when I download and examine the trash
directory from a failed run, it shows a different set of deltas than I
get locally. But the exact source of non-determinism isn't that
important; our test should be robust against any order.

There are a few options to fix this:

  a. It would be OK for the "objects64" setup to "unbreak" the .idx file
     after generating the midx. But then it would be hard for subsequent
     tests to reuse it, since it is the corrupted idx that forces the
     midx to have a large offset table.

  b. The "objects64" setup could use --delta-base-offset. This would fix
     our problem, but earlier tests have many hard-coded offsets. Using
     OFS_DELTA would change the locations of objects in the pack (this
     might even be OK because I think most of the offsets are within the
     .idx file, but it seems brittle and I'm afraid to touch it).

  c. Our cat-file output is in oid order by default. Since we store
     bases before deltas, if we went in pack order (using the
     "--unordered" flag), we'd always see our corrupt X before any delta
     which depends on it. But using "--unordered" means we skip the midx
     entirely. That makes sense, since it is just enumerating all of
     the packs, using the offsets found in their .idx files directly.
     So it doesn't work for our test.

  d. We could ask directly about object X, rather than enumerating all
     of them. But that requires further hard-coding of the oid (both
     sha1 and sha256) of object X. I'd prefer not to introduce more
     brittleness.

  e. We can use a --batch-check format that looks at the pack data, but
     doesn't have to chase deltas. The problem in this case is
     %(objecttype), which has to walk to the base. But %(objectsize)
     does not; we can get the value directly from the delta itself.
     Another option would be %(deltabase), where we report the REF_DELTA
     name but don't look at its data.

I've gone with option (e) here. It's kind of subtle, but it's simple and
has no side effects.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5319-multi-pack-index.sh | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 2a11dd1af6..d3c9e97feb 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1129,8 +1129,10 @@ test_expect_success 'reader bounds-checks large offset table' '
 		git multi-pack-index --object-dir=../objects64 write &&
 		midx=../objects64/pack/multi-pack-index &&
 		corrupt_chunk_file $midx LOFF clear &&
-		test_must_fail git cat-file \
-			--batch-check --batch-all-objects 2>err &&
+		# using only %(objectsize) is important here; see the commit
+		# message for more details
+		test_must_fail git cat-file --batch-all-objects \
+			--batch-check="%(objectsize)" 2>err &&
 		cat >expect <<-\EOF &&
 		fatal: multi-pack-index large offset out of bounds
 		EOF
-- 
2.42.0.937.g4d9eb42d36


^ permalink raw reply related

* Re: [PATCH] Add x64 SHA-NI implementation to sha1 and sha256 in git ?
From: Junio C Hamano @ 2023-10-14  0:29 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Feiyang Xue, git
In-Reply-To: <ZSmxFopwJyBGmZY-@tapette.crustytoothpaste.net>

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> If we ship the code, then someone has to maintain it, and I don't know
> if we have any assembly experts.  We might also have a bug that produced
> incorrect results, which would be pretty disastrous for the affected
> users.  It's much better for maintainability if we push that off onto
> the cryptographic library maintainers who are much more competent in
> that regard than I am (I will not speak for others) and let distro
> maintainers simply link the appropriate library, which, as I said above,
> they're already effectively doing.

Thanks for a well reasoned response.  Very much appreciated.

^ permalink raw reply

* Re: [PATCH] Add x64 SHA-NI implementation to sha1 and sha256 in git ?
From: brian m. carlson @ 2023-10-13 21:05 UTC (permalink / raw)
  To: Feiyang Xue; +Cc: git
In-Reply-To: <20231012200447.3553757-1-hi@fxue.dev>

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]

On 2023-10-12 at 20:04:47, Feiyang Xue wrote:
> Looking for comments on if this is a good idea to add SHA-NI option to git,
> And if so, what'd be a good implementation.
> 
> The attached patch is more of a proof of concept,  using a sha-ni example
> found on internet and have it tapped into git's "hash-ll.h" when it sees make
> flags "SHA1_SHANI=1" and/or "SHA256_SHANI=1" for sha1/sha256 respectively.
> 
> "git hash-object" is about 3-ish times faster for me

You almost certainly don't want to use SHA-1 using native instructions
because it doesn't detect collisions, unlike the default implementation
(SHA-1-DC).  Since SHA-1 collisions are relatively cheap (less than USD
45,000) to make, not detecting collisions would be a security issue
worthy of a CVE.

It's fine for SHA-256, but see below.

> There are few topics that i'm uncertain about.
> 
> 1. Is it good idea to have machine-specific asm codes in git. I read there was
> commit fd1ec82547 which removed assembly implementation for PPC_SHA1. Does that
> imply maintainers doesn't want machine-specific assembly in source?

I'd prefer we didn't.

Nettle has SHA-NI extensions on systems that support it, and so does
OpenSSL.  OpenSSL can't be used by distros for licensing reasons, but
Nettle can, and both of those libraries automatically detect the best
code to use based on the chip.  One of those libraries is almost always
going to be linked into Git at some point because of libcurl anyway.

If we ship the code, then someone has to maintain it, and I don't know
if we have any assembly experts.  We might also have a bug that produced
incorrect results, which would be pretty disastrous for the affected
users.  It's much better for maintainability if we push that off onto
the cryptographic library maintainers who are much more competent in
that regard than I am (I will not speak for others) and let distro
maintainers simply link the appropriate library, which, as I said above,
they're already effectively doing.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: [PATCH 6/8] midx: read `OIDF` chunk with `pair_chunk_expect()`
From: Junio C Hamano @ 2023-10-13 21:04 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King
In-Reply-To: <b39203b32c24772d5d8a257c4f647b7d9bd92d53.1697225110.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> +	if (pair_chunk_expect(cf, MIDX_CHUNKID_OIDFANOUT,
> +			      (const unsigned char **)&m->chunk_oid_fanout,
> +			      256 * sizeof(uint32_t))) {
> +		error(_("multi-pack-index OID fanout is of the wrong size"));
>  		die(_("multi-pack-index required OID fanout chunk missing or corrupted"));
> +	}

This is not a new problem, but when laid out this way, the doubled
messages look a bit suboptimal.

Together with reporting the actual and expected byte counts and
doing so consistenly I alluded to in a separate message, cleaning
these up should probably be left outside of the topic, I suspect.

> +	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
>  	if (read_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, midx_read_oid_lookup, m))
>  		die(_("multi-pack-index required OID lookup chunk missing or corrupted"));
>  	if (read_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, midx_read_object_offsets, m))

^ permalink raw reply

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
From: David Rogers @ 2023-10-13 20:49 UTC (permalink / raw)
  To: git

Git repositories have become indispensable resources for citing parts
of a development history with links.  However, the format of git
remote entries is not always distinguishable from other types of
citation -- for example a git reference vs. a plain URL.

Rather than rely on context to tell me that `https://github.com/git/git`
refers to a git repository which I could clone with git over https, it
would be nice to use a url like `git+https://github.com/git/git`
or even
`git+https://github.com/git/git?commit=d0e8084c65cbf949038ae4cc344ac2c2efd77415`
to unambiguously specify that the type of data and its method of
access are native to git.

This issue is extremely important for version control systems which
build dependency lists from git, e.g.
https://pip.pypa.io/en/stable/topics/vcs-support/

That project lists several invented URL schemes (all beginning with
git+) and assigning special reserved characters
(https://datatracker.ietf.org/doc/html/rfc3986#section-2.2)
git+https://git.example.com/MyProject.git@master
git+https://git.example.com/MyProject.git@v1.0
git+https://git.example.com/MyProject.git@da39a3ee5e6b4b0d3255bfef95601890afd80709
git+https://git.example.com/MyProject.git@refs/pull/123/head

It would be helpful for the git project itself to define its own URL
scheme to codify these use cases and, possibly in addition, provide a
standard way to reference within git repositories.

For reference, some of the ways URLs are already used/defined within
git are documented here:
- https://github.com/git/git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415/connect.c#L107
  (alternately, using gitweb syntax not actually available on github,
https://github.com/git/git.git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415:/git/connect.c)
- https://mirrors.edge.kernel.org/pub/software/scm/git/docs/gitremote-helpers.html
- https://git-scm.com/docs/git-http-backend
- https://git-scm.com/docs/gitweb

Currently, a comment in connect.c notes "git+" schemes were
deprecated.  However, I would argue that at a minimum, these "git+"
schemes should be a supported and documented feature of git.  Also,
something has to be fixed (or better communicated) about URLs of the
form "git@github.com:user/project.git"  These are implicitly treated
as "git+ssh://git@github.com/user/project.git", but the use of ":" is
confusing from the perspective of translating between these two forms.

In addition, the use of paths, queries, and fragments should be
considered to allow (IMHO) at least 3 distinct uses:
1. naming commit-ish objects (and potentially metadata like author and
parents within the commit)
2. naming tree-ish objects and paths within them
3. naming blobs (and potentially fragment identifiers like lines or
HTML tags within those blobs)

These further refinements don't have to be supported by any special
functions within git.  However, their existence may influence git data
structures and api-s in the future.

The last discussion I can find of this issue on the git mailing list
(https://lore.kernel.org/git/C9Y2DPYH4XO1.3KFD8LT770P2@taiga)
indicates that defining conventions like these within git's
documentation would be a good place to start.  On a separate thread, I
will send a draft "git+" URI naming scheme for discussion and eventual
submission to IANA
(https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml).

~ David M. Rogers

^ permalink raw reply

* Re: [PATCH v5 0/2] attr: add attr.tree config
From: Junio C Hamano @ 2023-10-13 20:47 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, Jeff King, Jonathan Tan, Eric Sunshine, John Cai
In-Reply-To: <xmqqa5smz2lk.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> So in a sense, for !!ignore_bad_attr_tree case, the code ends up
> doing the right thing.  But if !ignore_bad_attr_tree is true, i.e.,
> a blob object name is given via --attr-source or GIT_ATTR_SOURCE,
> then the bug will be uncovered.

Having said all that, I suspect that this problem is not new and
certainly not caused by this topic.  We should have unconditionally
died when GIT_ATTR_SOURCE gave a blob object name, but pretended as
if an empty tree was given.  There may even be existing users who
now assume that is working as intended and depend on this bug.

So, let's leave it as a "possible bug" that we might want to fix in
the future, outside the scope of this series.

Thanks.


>  t/t0003-attributes.sh | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git c/t/t0003-attributes.sh w/t/t0003-attributes.sh
> index ecf43ab545..0f02f22171 100755
> --- c/t/t0003-attributes.sh
> +++ w/t/t0003-attributes.sh
> @@ -394,6 +394,18 @@ test_expect_success 'bare repo defaults to reading .gitattributes from HEAD' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success '--attr-source that points at a non-treeish' '
> +	test_when_finished rm -rf empty &&
> +	git init empty &&
> +	(
> +		cd empty &&
> +		echo "$bad_attr_source_err" >expect_err &&
> +		H=$(git hash-object -t blob --stdin -w </dev/null) &&
> +		test_must_fail git --attr-source=$H check-attr test -- f/path 2>err &&
> +		test_cmp expect_err err
> +	)
> +'
> +
>  test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tree' '
>  	test_when_finished rm -rf empty &&
>  	git init empty &&

^ permalink raw reply

* Re: [PATCH v2 1/2] t: add a test helper to truncate files
From: Junio C Hamano @ 2023-10-13 20:32 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Taylor Blau, Jason Hatton
In-Reply-To: <ZSmmD5sz7TPWCGRd@tapette.crustytoothpaste.net>

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-10-12 at 22:52:59, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> 
>> > +	fd = open(argv[1], O_WRONLY | O_CREAT, 0600);
>> > +	if (fd < 0)
>> > +		die_errno("failed to open file %s", argv[1]);
>> 
>> contrib/coccinelle/xopen.cocci tells us to write this simply as
>> 
>> 	fd = xopen(argv[1], O_WRONLY | O_CREAT, 0600);
>
> Sure, I can do that.

Unless there are other changes needed, I'll handle it on this end.

Thanks.

^ permalink raw reply

* Re: [PATCH v5 0/2] attr: add attr.tree config
From: Junio C Hamano @ 2023-10-13 20:31 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, Jeff King, Jonathan Tan, Eric Sunshine, John Cai
In-Reply-To: <xmqqmswmz76w.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> 	if (repo_get_oid_treeish(the_repository,
> 				 default_attr_source_tree_object_name,
> 				 attr_source) && !ignore_bad_attr_tree)
> 		die(_("bad --attr-source or GIT_ATTR_SOURCE"));
>
> OOPS!  Sorry for not noticing earlier, but repo_get_oid_treeish()
> does *NOT* error out when the discovered object is not a treeish, as
> the suggested object type is merely supplied for disambiguation
> purposes (e.g., with objects 012345 that is a tree and 012346 that
> is a blob, you can still ask for treeish "01234" but if you ask for
> an object "01234" it will fail).
>
> So, the alternative test would have caught this bug, no?  Instead of
> silently treating the non-treeish as an empty tree, we would have
> died much later when the object supposedly a tree-ish turns out to
> be a blob, or something?

There indeed is a bug, but not really.  If we add this test:

test_expect_success 'attr.tree that points at a non-treeish' '
	test_when_finished rm -rf empty &&
	git init empty &&
	(
		cd empty &&
		echo "f/path: test: unspecified" >expect &&
		H=$(git hash-object -t blob --stdin -w </dev/null) &&
		git -c attr.tree=$H check-attr test -- f/path >actual 2>err &&
		test_must_be_empty err &&
		test_cmp expect actual
	)
'

repo_get_oid_treeish() returns a blob object name and we end up
storing a blob object name in "attr_source" static variable of
default_attr_source() function.

Later this is fed to read_attr() by bootstrap_attr_stack() and then
to read_attr_from_blob() that uses it to call get_tree_entry(),
which fails for any path because it is a blob.  We do not give any
errors or error messages during the whole process.

So in a sense, for !!ignore_bad_attr_tree case, the code ends up
doing the right thing.  But if !ignore_bad_attr_tree is true, i.e.,
a blob object name is given via --attr-source or GIT_ATTR_SOURCE,
then the bug will be uncovered.

 t/t0003-attributes.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git c/t/t0003-attributes.sh w/t/t0003-attributes.sh
index ecf43ab545..0f02f22171 100755
--- c/t/t0003-attributes.sh
+++ w/t/t0003-attributes.sh
@@ -394,6 +394,18 @@ test_expect_success 'bare repo defaults to reading .gitattributes from HEAD' '
 	test_cmp expect actual
 '
 
+test_expect_success '--attr-source that points at a non-treeish' '
+	test_when_finished rm -rf empty &&
+	git init empty &&
+	(
+		cd empty &&
+		echo "$bad_attr_source_err" >expect_err &&
+		H=$(git hash-object -t blob --stdin -w </dev/null) &&
+		test_must_fail git --attr-source=$H check-attr test -- f/path 2>err &&
+		test_cmp expect_err err
+	)
+'
+
 test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tree' '
 	test_when_finished rm -rf empty &&
 	git init empty &&


^ permalink raw reply related

* Re: [PATCH] send-email: move process_address_list earlier to avoid, uninitialized address error
From: Michael Strawbridge @ 2023-10-13 20:25 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: Bagas Sanjaya, Todd Zullinger, Luben Tuikov,
	Ævar Arnfjörð Bjarmason, Taylor Blau,
	Git Mailing List, Uwe Kleine-König
In-Reply-To: <20231011224753.GE518221@coredump.intra.peff.net>


On 10/11/23 18:47, Jeff King wrote:
> On Wed, Oct 11, 2023 at 03:37:39PM -0700, Junio C Hamano wrote:
>
>> On the other hand, I am not sure what is wrong with "after the user
>> typed", actually.  As you said, anybody sane would be using --to (or
>> an equivalent configuration variable in the repository) to send
>> their patches to the project address instead of typing, and to them
>> it is not a problem.  After getting the recipient address from the
>> end user, the validation may fail due to a wrong address, in which
>> case it is a good thing.  If the validation failed due to wrong
>> contents of the patch (perhaps it included a change to the file with
>> trade secret that appeared in the context lines), as long as the
>> reason why the validation hook rejected the patches is clear enough
>> (e.g., "it's the patches, not the recipients"), such "a rejection
>> after typing" would be only once per a patch series, so it does not
>> sound too bad, either.
>>
>> But perhaps I am not seeing the reason why "fail after the user typed"
>> is so disliked and being unnecessarily unsympathetic.  I dunno.
> I did not look carefully at the flow of send-email, so this may or may
> not be an issue. But what I think would be _really_ annoying is if you
> asked to write a cover letter, went through the trouble of writing it,
> and then send-email bailed due to some validation failure that could
> have been checked earlier.
>
> There is probably a way to recover your work (presumably we leave it in
> a temporary file somewhere), but it may not be entirely trivial,
> especially for users who are not comfortable with advanced usage of
> their editor. ;)
As I was looking at covering the case of interactive input (--compose) to the fix I noticed that this seems to be at least partly handled by the $compose_filename code.  There is a nice output message telling you exactly where the intermediate version of the email you are composing is located if there are errors.  I took a quick look inside and can verify that any lost work should be minimal as long as someone knows how to edit files with their editor of choice.
>
> I seem to remember we had one or two such problems in the early days
> with "git commit", where you would go to the trouble to type a commit
> message only to bail on some condition which _could_ have been checked
> earlier. You can recover the message from .git/COMMIT_EDITMSG, but you
> need to remember to do so before re-invoking "git commit", otherwise it
> gets obliterated.
>
> Now for send-email, if your flow is to generate the patches with
> "format-patch", then edit the cover letter separately, and then finally
> ship it all out with "send-email", that might not be an issue. But some
> workflows use the --compose option instead.
>
> -Peff
I have been looking into handling the interactive input cases while solving this issue, but have yet to make a breakthrough.  Simply moving the validation code below the original process_address_list code results in a a scenario where I get the email address being seen as something like "ARRAY (0x55ddb951d768)" rather than the email address I wrote in the compose buffer.

^ permalink raw reply

* Re: [PATCH v2 1/2] t: add a test helper to truncate files
From: brian m. carlson @ 2023-10-13 20:23 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git, Junio C Hamano, Taylor Blau, Jason Hatton
In-Reply-To: <CAPig+cSUvyCS-NOYOoJAmg7LGyU5Dqky5HyS-QTNW1QoHj-0bA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

On 2023-10-12 at 17:49:13, Eric Sunshine wrote:
> > +       sz = strtoumax(argv[2], &p, 0);
> > +       if (*p)
> > +               die("invalid size");
> 
> Do you want to check 'errno' here, as well (probably before the '*p' check)?
> 
> Or is that being too defensive for a 'test-tool' command?

I don't believe that's necessary.  The Linux manual page leads me to
believe that if *argv[2] is not 0 but *p is 0, then the entire string is
valid, which would imply that errno is not set.  I'm happy to ignore for
the moment the case where the user specifies "" as the argument, because
it is a test helper and "don't do weird, unexpected things with the test
helper without looking at the source code first" is legitimate advice
for Git developers.

If this were user-facing, improved robustness would be warranted, I
agree.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/2] t: add a test helper to truncate files
From: brian m. carlson @ 2023-10-13 20:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Jason Hatton
In-Reply-To: <xmqqsf6f312s.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

On 2023-10-12 at 22:52:59, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > +	fd = open(argv[1], O_WRONLY | O_CREAT, 0600);
> > +	if (fd < 0)
> > +		die_errno("failed to open file %s", argv[1]);
> 
> contrib/coccinelle/xopen.cocci tells us to write this simply as
> 
> 	fd = xopen(argv[1], O_WRONLY | O_CREAT, 0600);

Sure, I can do that.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: why does git set X in LESS env var?
From: Dragan Simic @ 2023-10-13 20:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Christoph Anton Mitterer, git
In-Reply-To: <xmqqr0lzhkzk.fsf@gitster.g>

On 2023-10-12 18:19, Junio C Hamano wrote:
> Dragan Simic <dsimic@manjaro.org> writes:
> 
>> Please note that dropping "-X" and leaving "-F" would actually
>> introduce the inconsistency that I already mentioned.  To reiterate,
>> short outputs would then remain displayed on screen, while long
>> outputs would disappear after exiting less(1).
> 
> Good point.

I've been thinking about this, and a rather elegant, backward-compatible 
solution is possible, but it requires some improvements to be made to 
less(1) first.  I'll reach out to the author of less(1) and propose that 
new feature, and I'll let you know his opinion about it.

^ permalink raw reply

* Re: [PATCH 5/8] commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
From: Taylor Blau @ 2023-10-13 19:49 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King
In-Reply-To: <45cac29403e63483951f7766c6da3c022c68d9f0.1697225110.git.me@ttaylorr.com>

On Fri, Oct 13, 2023 at 03:25:30PM -0400, Taylor Blau wrote:
> Perform an identical conversion as in previous commits to read the BIDX
> chunk.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  commit-graph.c | 18 ++++--------------
>  1 file changed, 4 insertions(+), 14 deletions(-)

Oops. This fails t4216 because it changes the warning() message, which I
thought I excluded from this patch :-<.

Here is a replacement that passes the test. I can reroll the entire
"series" if we decide that this is a worthwhile direction to go in:

--- 8< ---

Subject: [PATCH] commit-graph: read `BIDX` chunk with `pair_chunk_expect()`

Perform an identical conversion as in previous commits to read the BIDX
chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 0fab99f5dd..66c2e628d8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -330,18 +330,6 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }

-static int graph_read_bloom_index(const unsigned char *chunk_start,
-				  size_t chunk_size, void *data)
-{
-	struct commit_graph *g = data;
-	if (chunk_size != g->num_commits * 4) {
-		warning("commit-graph changed-path index chunk is too small");
-		return -1;
-	}
-	g->chunk_bloom_indexes = chunk_start;
-	return 0;
-}
-
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -461,8 +449,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	}

 	if (s->commit_graph_read_changed_paths) {
-		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-			   graph_read_bloom_index, graph);
+		if (pair_chunk_expect(cf, GRAPH_CHUNKID_BLOOMINDEXES,
+				      &graph->chunk_bloom_indexes,
+				      st_mult(graph->num_commits, 4)) == -1)
+			warning(_("commit-graph changed-path index chunk is too small"));
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
 			   graph_read_bloom_data, graph);
 	}
--
2.38.0.16.g393fd4c6db


^ permalink raw reply related

* [PATCH 8/8] midx: read `OOFF` chunk with `pair_chunk_expect()`
From: Taylor Blau @ 2023-10-13 19:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King
In-Reply-To: <cover.1697225110.git.me@ttaylorr.com>

Perform an identical conversion as in previous commits to read the
OOFF chunk in the MIDX machinery.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/midx.c b/midx.c
index 74167b8fdb..4aeadd5e00 100644
--- a/midx.c
+++ b/midx.c
@@ -61,19 +61,6 @@ void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m)
 	strbuf_addf(out, "-%s.rev", hash_to_hex(get_midx_checksum(m)));
 }
 
-static int midx_read_object_offsets(const unsigned char *chunk_start,
-				    size_t chunk_size, void *data)
-{
-	struct multi_pack_index *m = data;
-	m->chunk_object_offsets = chunk_start;
-
-	if (chunk_size != st_mult(m->num_objects, MIDX_CHUNK_OFFSET_WIDTH)) {
-		error(_("multi-pack-index object offset chunk is the wrong size"));
-		return 1;
-	}
-	return 0;
-}
-
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
 {
 	struct multi_pack_index *m = NULL;
@@ -158,8 +145,12 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 		error(_("multi-pack-index OID lookup chunk is the wrong size"));
 		die(_("multi-pack-index required OID lookup chunk missing or corrupted"));
 	}
-	if (read_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, midx_read_object_offsets, m))
+	if (pair_chunk_expect(cf, MIDX_CHUNKID_OBJECTOFFSETS,
+			      &m->chunk_object_offsets,
+			      st_mult(m->num_objects, MIDX_CHUNK_OFFSET_WIDTH))) {
+		error(_("multi-pack-index object offset chunk is the wrong size"));
 		die(_("multi-pack-index required object offsets chunk missing or corrupted"));
+	}
 
 	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets,
 		   &m->chunk_large_offsets_len);
-- 
2.42.0.352.gd9c5062ff7.dirty

^ permalink raw reply related

* [PATCH 7/8] midx: read `OIDL` chunk with `pair_chunk_expect()`
From: Taylor Blau @ 2023-10-13 19:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King
In-Reply-To: <cover.1697225110.git.me@ttaylorr.com>

Perform an identical conversion as in previous commits to read the
OIDL chunk in the MIDX machinery.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/midx.c b/midx.c
index 38bf816cce..74167b8fdb 100644
--- a/midx.c
+++ b/midx.c
@@ -61,19 +61,6 @@ void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m)
 	strbuf_addf(out, "-%s.rev", hash_to_hex(get_midx_checksum(m)));
 }
 
-static int midx_read_oid_lookup(const unsigned char *chunk_start,
-				size_t chunk_size, void *data)
-{
-	struct multi_pack_index *m = data;
-	m->chunk_oid_lookup = chunk_start;
-
-	if (chunk_size != st_mult(m->hash_len, m->num_objects)) {
-		error(_("multi-pack-index OID lookup chunk is the wrong size"));
-		return 1;
-	}
-	return 0;
-}
-
 static int midx_read_object_offsets(const unsigned char *chunk_start,
 				    size_t chunk_size, void *data)
 {
@@ -166,8 +153,11 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 		die(_("multi-pack-index required OID fanout chunk missing or corrupted"));
 	}
 	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
-	if (read_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, midx_read_oid_lookup, m))
+	if (pair_chunk_expect(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup,
+			      st_mult(m->hash_len, m->num_objects))) {
+		error(_("multi-pack-index OID lookup chunk is the wrong size"));
 		die(_("multi-pack-index required OID lookup chunk missing or corrupted"));
+	}
 	if (read_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, midx_read_object_offsets, m))
 		die(_("multi-pack-index required object offsets chunk missing or corrupted"));
 
-- 
2.42.0.352.gd9c5062ff7.dirty


^ permalink raw reply related

* [PATCH 6/8] midx: read `OIDF` chunk with `pair_chunk_expect()`
From: Taylor Blau @ 2023-10-13 19:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King
In-Reply-To: <cover.1697225110.git.me@ttaylorr.com>

Perform an identical conversion as in previous commits to read the
OIDF chunk in the MIDX machinery.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/midx.c b/midx.c
index 2f3863c936..38bf816cce 100644
--- a/midx.c
+++ b/midx.c
@@ -61,20 +61,6 @@ void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m)
 	strbuf_addf(out, "-%s.rev", hash_to_hex(get_midx_checksum(m)));
 }
 
-static int midx_read_oid_fanout(const unsigned char *chunk_start,
-				size_t chunk_size, void *data)
-{
-	struct multi_pack_index *m = data;
-	m->chunk_oid_fanout = (uint32_t *)chunk_start;
-
-	if (chunk_size != 4 * 256) {
-		error(_("multi-pack-index OID fanout is of the wrong size"));
-		return 1;
-	}
-	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
-	return 0;
-}
-
 static int midx_read_oid_lookup(const unsigned char *chunk_start,
 				size_t chunk_size, void *data)
 {
@@ -173,8 +159,13 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 	if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names, &m->chunk_pack_names_len))
 		die(_("multi-pack-index required pack-name chunk missing or corrupted"));
-	if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m))
+	if (pair_chunk_expect(cf, MIDX_CHUNKID_OIDFANOUT,
+			      (const unsigned char **)&m->chunk_oid_fanout,
+			      256 * sizeof(uint32_t))) {
+		error(_("multi-pack-index OID fanout is of the wrong size"));
 		die(_("multi-pack-index required OID fanout chunk missing or corrupted"));
+	}
+	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 	if (read_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, midx_read_oid_lookup, m))
 		die(_("multi-pack-index required OID lookup chunk missing or corrupted"));
 	if (read_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, midx_read_object_offsets, m))
-- 
2.42.0.352.gd9c5062ff7.dirty


^ permalink raw reply related

* [PATCH 5/8] commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
From: Taylor Blau @ 2023-10-13 19:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King
In-Reply-To: <cover.1697225110.git.me@ttaylorr.com>

Perform an identical conversion as in previous commits to read the BIDX
chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 0fab99f5dd..ad98f6334d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -330,18 +330,6 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
-static int graph_read_bloom_index(const unsigned char *chunk_start,
-				  size_t chunk_size, void *data)
-{
-	struct commit_graph *g = data;
-	if (chunk_size != g->num_commits * 4) {
-		warning("commit-graph changed-path index chunk is too small");
-		return -1;
-	}
-	g->chunk_bloom_indexes = chunk_start;
-	return 0;
-}
-
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -461,8 +449,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	}
 
 	if (s->commit_graph_read_changed_paths) {
-		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-			   graph_read_bloom_index, graph);
+		if (pair_chunk_expect(cf, GRAPH_CHUNKID_BLOOMINDEXES,
+				      &graph->chunk_bloom_indexes,
+				      st_mult(graph->num_commits, 4)) == -1)
+			warning(_("commit-graph changed-path index chunk is too small (%d)"), graph->num_commits * 4);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
 			   graph_read_bloom_data, graph);
 	}
-- 
2.42.0.352.gd9c5062ff7.dirty


^ permalink raw reply related

* [PATCH 4/8] commit-graph: read `GDAT` chunk with `pair_chunk_expect()`
From: Taylor Blau @ 2023-10-13 19:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King
In-Reply-To: <cover.1697225110.git.me@ttaylorr.com>

Perform an identical conversion as in previous commits to read the GDAT
chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 97d4824673..0fab99f5dd 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -330,16 +330,6 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
-static int graph_read_generation_data(const unsigned char *chunk_start,
-				      size_t chunk_size, void *data)
-{
-	struct commit_graph *g = data;
-	if (chunk_size != g->num_commits * sizeof(uint32_t))
-		return error("commit-graph generations chunk is wrong size");
-	g->chunk_generation_data = chunk_start;
-	return 0;
-}
-
 static int graph_read_bloom_index(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -457,8 +447,11 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 		   &graph->chunk_base_graphs_size);
 
 	if (s->commit_graph_generation_version >= 2) {
-		read_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
-			   graph_read_generation_data, graph);
+		if (pair_chunk_expect(cf, GRAPH_CHUNKID_GENERATION_DATA,
+				      &graph->chunk_generation_data,
+				      st_mult(graph->num_commits,
+					      sizeof(uint32_t))))
+			error(_("commit-graph generations chunk is wrong size"));
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
 			   &graph->chunk_generation_data_overflow,
 			   &graph->chunk_generation_data_overflow_size);
-- 
2.42.0.352.gd9c5062ff7.dirty


^ permalink raw reply related

* [PATCH 3/8] commit-graph: read `CDAT` chunk with `pair_chunk_expect()`
From: Taylor Blau @ 2023-10-13 19:25 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King
In-Reply-To: <cover.1697225110.git.me@ttaylorr.com>

Perform a similar conversion as in the previous commit read the CDAT
bits.

While we're here, mark the error() string for translation, and guard
against overflow when computing the expected size by wrapping it in an
st_mult() call.

Note that the pre-image of this patch was already sufficiently guarded
against overflow, since GRAPH_DATA_WIDTH is defined as
(the_hash_algo->rawsz + 16), so the expression in the parenthesis would
get performed as a size_t, and then g->num_commits would be promoted to
the width of size_t for the purposes of evaluating this expression.

But let's make it explicitly clear that this computation is safe by
wrapping it in an st_mult() call.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index cdefd7f926..97d4824673 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -330,16 +330,6 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
-static int graph_read_commit_data(const unsigned char *chunk_start,
-				  size_t chunk_size, void *data)
-{
-	struct commit_graph *g = data;
-	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)
-		return error("commit-graph commit data chunk is wrong size");
-	g->chunk_commit_data = chunk_start;
-	return 0;
-}
-
 static int graph_read_generation_data(const unsigned char *chunk_start,
 				      size_t chunk_size, void *data)
 {
@@ -457,7 +447,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 			      256 * sizeof(uint32_t)))
 		error(_("commit-graph oid fanout chunk is wrong size"));
 	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
-	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
+	if (pair_chunk_expect(cf, GRAPH_CHUNKID_DATA,
+			      &graph->chunk_commit_data,
+			      st_mult(graph->num_commits, GRAPH_DATA_WIDTH)))
+		error(_("commit-graph commit data chunk is wrong size"));
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges,
 		   &graph->chunk_extra_edges_size);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs,
-- 
2.42.0.352.gd9c5062ff7.dirty


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox