($INBOX_DIR/description missing)
 help / color / mirror / Atom feed
* E::mailing list::::
From: ian gibson @ 2023-02-01 16:39 UTC (permalink / raw)
  To: git



Sent from my iPhone

^ permalink raw reply

* Re: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)
From: D. Ben Knoble @ 2023-02-01 16:21 UTC (permalink / raw)
  To: demerphq; +Cc: git
In-Reply-To: <CANgJU+X_e0owKC3uWPaA_gVP54syF1+MJ-cTn+fjPrNS5LDsMA@mail.gmail.com>

On Wed, Feb 1, 2023 at 11:09 AM demerphq <demerphq@gmail.com> wrote:
> FWIW that looks pretty weird to me, like the escapes in the charclass
> were interpolated before being fed to the regex engine. Are you sure
> you tested the right thing?

Quite sure. `git diff --word-diff` fails. This was just a smaller
example based on the linked C code.

Here's the output of `git diff --word-diff` (verbatim and dumped):

```
fatal : invalid regular expression: \|([^\\]*)\||([^][)(}{[
])+|[^[:space:]]|[¿-ˇ][Ä-ø]+
00000000: 6661 7461 6cc2 a03a 2069 6e76 616c 6964  fatal..: invalid
00000010: 2072 6567 756c 6172 2065 7870 7265 7373   regular express
00000020: 696f 6e3a 205c 7c28 5b5e 5c5c 5d2a 295c  ion: \|([^\\]*)\
00000030: 7c7c 285b 5e5d 5b29 287d 7b5b 2009 5d29  ||([^][)(}{[ .])
00000040: 2b7c 5b5e 5b3a 7370 6163 653a 5d5d 7c5b  +|[^[:space:]]|[
00000050: c02d ff5d 5b80 2dbf 5d2b 0a              .-.][.-.]+.
```

-- 
D. Ben Knoble

^ permalink raw reply

* Re: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)
From: demerphq @ 2023-02-01 16:09 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: git
In-Reply-To: <CALnO6CAZtwfGY4SYeOuKqdP9+e_0EYNf4F703DRQB7UUfd_bUg@mail.gmail.com>

On Wed, 1 Feb 2023 at 16:25, D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> I recently updated to git 2.39.1 and noticed today that `git diff
> --word-diff` fails for files with `diff=scheme`. I was able to narrow
> the failure down to the inclusion of control characters \xc0, \xff,
> \x80, \xbf by https://github.com/git/git/blob/2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473/userdiff.c#L17
> in the definition of the scheme diff pattern (really, all patterns).
>
> I suspect the commit referenced in the subject, given that it messes
> with regex handling on macOS.
>
> Relevant environment that I can think of:
> ```
> # locale
> LANG="fr_FR.UTF-8"
> LC_COLLATE="fr_FR.UTF-8"
> LC_CTYPE="fr_FR.UTF-8"
> LC_MESSAGES="fr_FR.UTF-8"
> LC_MONETARY="fr_FR.UTF-8"
> LC_NUMERIC="fr_FR.UTF-8"
> LC_TIME="fr_FR.UTF-8"
> LC_ALL="fr_FR.UTF-8"
> ```
>
> I'm on macOS 11.7.
>
> Failure (using Zsh to produce the characters; I think there's a Bash
> equivalent):
> ```
> # git diff --word-diff --word-diff-regex=$'[\xc0-\xff][\x80-\xbf]+'
> fatal¬†: invalid regular expression: [¿-ˇ][Ä-ø]+
> ```

FWIW that looks pretty weird to me, like the escapes in the charclass
were interpolated before being fed to the regex engine. Are you sure
you tested the right thing?

Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply

* [PATCH] rev-list: clarify git-log default date format
From: Rafael Dulfer @ 2023-02-01 15:57 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Andrei Rybak, Rafael Dulfer

From: Rafael Dulfer <rafael.dulfer@gmail.com>

Currently, the documentation is slightly incomplete, not explaining all the differences the default format has with rfc2822. Leading to confusion for people trying to parse the date format outputted by git log

This patch adds 2 more exceptions when compared to rfc2822. Also adds an example of what the format looks like (I originally wanted to specify this in strftime notation, but because of the way day-of-month is formatted this is impossible)

Signed-off-by: Rafael Dulfer <rafael.dulfer@gmail.com>
---
 Documentation/rev-list-options.txt | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index ff68e48406..8bc8475f3e 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -1103,9 +1103,15 @@ format placeholders. When using `-local`, the correct syntax is
 `--date=default` is the default format, and is similar to
 `--date=rfc2822`, with a few exceptions:
 --
-	- there is no comma after the day-of-week
+	- There is no comma after the day-of-week
 
-	- the time zone is omitted when the local time zone is used
+	- The time zone is omitted when the local time zone is used
+
+	- Day-of-month and month are switched around
+
+	- Time-of-day and the year are switched around
+
+As a result, the format looks as follows: `Thu Jan 1 00:00:00 1970 +0000` with `+0000` being omitted when the local time zone is used.
 
 ifdef::git-rev-list[]
 --header::
-- 
2.39.1


^ permalink raw reply related

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: demerphq @ 2023-02-01 15:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Michal Suchánek, brian m. carlson, Konstantin Ryabitsev,
	Eli Schwartz, Git List
In-Reply-To: <230201.86cz6tqyvy.gmgdl@evledraar.gmail.com>

On Wed, 1 Feb 2023 at 14:49, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Wed, Feb 01 2023, demerphq wrote:
>
> > On Wed, 1 Feb 2023, 20:21 Michal Suchánek, <msuchanek@suse.de> wrote:
> >>
> >> On Wed, Feb 01, 2023 at 12:34:06PM +0100, demerphq wrote:
> >> > Why does it have to be gzip? It is not that hard to come up with a
> >
> >> historical reasons?
> >
> > Currently git doesn't advertise that archive creation is stable
> > right[1]? So I wrote that with the assumption that this new
> > compression would only be used when making a new archive with a
> > hypothetical new '--stable' option. So historical reasons don't come
> > up. Or was there some other form of history that you meant?
>
> We haven't advertised it, but people have come to rely on it, as the
> widespread breakages reported when upgrading to v2.38.0 at the start of
> this thread show.
>
> That's unfortunate, and those people probably shouldn't have done that,
> but that's water under the bridge. I think it would be irresponsible to
> change the output willy-nilly at this point, especially when it seems
> rather easy to find some compromise everyone will be happy with.
>
> > I'm just trying to point out here that stable compression is doable
> > and doesn't need to be as complex as specifying a stable gzip format.
> > I am not even saying git should just do this, just that it /could/ if
> > it decided that stability was important, and that doing so wouldn't
> > involve the complexity that Avar was implying would be needed.  Simple
> > compression like LZ variants are pretty straightforward to implement,
> > achieve pretty good compression and can run pretty fast.
> >
> > Yves
> > [1] if it did the issue kicking off this thread would not have
> > happened as there would be a test that would have noticed the change.
>
> I have some patches I'm about to submit to address issues in this
> thread, and it does add *a* test for archive output stability.
>
> But I'm not at all confident that it's exhaustive. I just found it by
> experiment, by locating tests ouf ours where the "git archive" output at
> the end is different with gzip and "git archive gzip".
>
> But is it guaranteed to find all potential cases where repository
> content might trigger different output with different gzip
> implementations? I don't know, but probably not.

BTW, I just happened to be looking at the zstd docs (I am updating
code that uses it), I saw this:

Zstandard's format is stable and documented in
[RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple
independent implementations are already available.
This repository represents the reference implementation, provided as
an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C**
library,
and a command line utility producing and decoding `.zst`, `.gz`, `.xz`
and `.lz4` files.
Should your project require another programming language,
a list of known ports and bindings is provided on [Zstandard
homepage](http://www.zstd.net/#other-languages).

So it sounds like that is a spec you could use. Not sure exactly what
they mean by "stable", but given the .gz compatibility maybe it would
be worth considering. Its a lot faster than zlib. (The library I
support includes Snappy, Zlib, and Zstd, and the latter is faster and
better than the other two.)

Yves
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply

* RE: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)
From: D. Ben Knoble @ 2023-02-01 15:18 UTC (permalink / raw)
  To: git

I recently updated to git 2.39.1 and noticed today that `git diff
--word-diff` fails for files with `diff=scheme`. I was able to narrow
the failure down to the inclusion of control characters \xc0, \xff,
\x80, \xbf by https://github.com/git/git/blob/2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473/userdiff.c#L17
in the definition of the scheme diff pattern (really, all patterns).

I suspect the commit referenced in the subject, given that it messes
with regex handling on macOS.

Relevant environment that I can think of:
```
# locale
LANG="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_CTYPE="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_ALL="fr_FR.UTF-8"
```

I'm on macOS 11.7.

Failure (using Zsh to produce the characters; I think there's a Bash
equivalent):
```
# git diff --word-diff --word-diff-regex=$'[\xc0-\xff][\x80-\xbf]+'
fatal¬†: invalid regular expression: [¿-ˇ][Ä-ø]+
```
(Looks like the output is a bit scrambled; here's the hexdump)
```
# !! |& xxd
00000000: 6661 7461 6cc2 a03a 2069 6e76 616c 6964  fatal..: invalid
00000010: 2072 6567 756c 6172 2065 7870 7265 7373   regular express
00000020: 696f 6e3a 205b c02d ff5d 5b80 2dbf 5d2b  ion: [.-.][.-.]+
00000030: 0a                                       .
```

-- 
D. Ben Knoble

^ permalink raw reply

* [PATCH v3] win32: check for NULL after creating thread
From: Rose via GitGitGadget @ 2023-02-01 14:40 UTC (permalink / raw)
  To: git; +Cc: Rose, Seija Kijin
In-Reply-To: <pull.1445.v2.git.git.1675176818033.gitgitgadget@gmail.com>

From: Seija Kijin <doremylover123@gmail.com>

Check for NULL handles, not "INVALID_HANDLE,"
as CreateThread guarantees a valid handle in most cases.

The return value for failed thread creation is NULL,
not INVALID_HANDLE_VALUE, unlike other Windows API functions.

Signed-off-by: Seija Kijin <doremylover123@gmail.com>
---
    win32: check for NULL after creating thread
    
    Check for NULL handles, not "INVALID_HANDLE," as CreateThread guarantees
    a valid handle in most cases.
    
    The return value for failed thread creation is NULL, not
    INVALID_HANDLE_VALUE, unlike other Windows API functions.
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1445%2FAtariDreams%2FhThread-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1445/AtariDreams/hThread-v3
Pull-Request: https://github.com/git/git/pull/1445

Range-diff vs v2:

 1:  c956cafdec9 ! 1:  1cbc43e0d82 win32: check for NULL after creating thread
     @@ Commit message
          as CreateThread guarantees a valid handle in most cases.
      
          The return value for failed thread creation is NULL,
     -    not INVALID_HANDLE_VALUE, unlike other Windows
     -    API functions.
     +    not INVALID_HANDLE_VALUE, unlike other Windows API functions.
      
          Signed-off-by: Seija Kijin <doremylover123@gmail.com>
      


 compat/winansi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compat/winansi.c b/compat/winansi.c
index 3abe8dd5a27..f83610f684d 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -644,7 +644,7 @@ void winansi_init(void)
 
 	/* start console spool thread on the pipe's read end */
 	hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
-	if (hthread == INVALID_HANDLE_VALUE)
+	if (!hthread)
 		die_lasterr("CreateThread(console_thread) failed");
 
 	/* schedule cleanup routine */

base-commit: 2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473
-- 
gitgitgadget

^ permalink raw reply related

* Re: [RFC/PATCH 0/6] hash-object: use fsck to check objects
From: Ævar Arnfjörð Bjarmason @ 2023-02-01 14:06 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, git
In-Reply-To: <d225dddc-973c-f710-9d24-cb53b26b973f@web.de>


On Sun, Jan 22 2023, René Scharfe wrote:

> Am 22.01.23 um 08:48 schrieb Jeff King:
>> We probably also should outright reject gigantic trees,
>> which closes out a whole class of integer truncation problems. I know
>> GitHub has rejected trees over 100MB for years for this reason.
>
> Makes sense.

I really don't think it does, let's not forever encode arbitrary limits
in the formats because of transitory implementation details.

Those sort of arbitrary limits are understandable for hosting providers,
and a sensible trade-off on that front.

But for git as a general tool? I'd like to be able to throw whatever
garbage I've got locally at it, and not have it complain.

We already have a deluge of int v.s. unsigned int v.s. size_t warnings
that we could address, we're just choosing to suppress them
currently. Instead we have hacks like cast_size_t_to_int().

Those sorts of hacks are understandable as band-aid fixes, but let's
work on fixing the real causes.

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Ævar Arnfjörð Bjarmason @ 2023-02-01 13:43 UTC (permalink / raw)
  To: demerphq
  Cc: Michal Suchánek, brian m. carlson, Konstantin Ryabitsev,
	Eli Schwartz, Git List
In-Reply-To: <CANgJU+VLseURimM++38WA81uFPbnoHiToOt4F4UFL9yVbQpBEw@mail.gmail.com>


On Wed, Feb 01 2023, demerphq wrote:

> On Wed, 1 Feb 2023, 20:21 Michal Suchánek, <msuchanek@suse.de> wrote:
>>
>> On Wed, Feb 01, 2023 at 12:34:06PM +0100, demerphq wrote:
>> > Why does it have to be gzip? It is not that hard to come up with a
>
>> historical reasons?
>
> Currently git doesn't advertise that archive creation is stable
> right[1]? So I wrote that with the assumption that this new
> compression would only be used when making a new archive with a
> hypothetical new '--stable' option. So historical reasons don't come
> up. Or was there some other form of history that you meant?

We haven't advertised it, but people have come to rely on it, as the
widespread breakages reported when upgrading to v2.38.0 at the start of
this thread show.

That's unfortunate, and those people probably shouldn't have done that,
but that's water under the bridge. I think it would be irresponsible to
change the output willy-nilly at this point, especially when it seems
rather easy to find some compromise everyone will be happy with.

> I'm just trying to point out here that stable compression is doable
> and doesn't need to be as complex as specifying a stable gzip format.
> I am not even saying git should just do this, just that it /could/ if
> it decided that stability was important, and that doing so wouldn't
> involve the complexity that Avar was implying would be needed.  Simple
> compression like LZ variants are pretty straightforward to implement,
> achieve pretty good compression and can run pretty fast.
>
> Yves
> [1] if it did the issue kicking off this thread would not have
> happened as there would be a test that would have noticed the change.

I have some patches I'm about to submit to address issues in this
thread, and it does add *a* test for archive output stability.

But I'm not at all confident that it's exhaustive. I just found it by
experiment, by locating tests ouf ours where the "git archive" output at
the end is different with gzip and "git archive gzip".

But is it guaranteed to find all potential cases where repository
content might trigger different output with different gzip
implementations? I don't know, but probably not.

^ permalink raw reply

* Re: [PATCH 6/6] hash-object: use fsck for object checks
From: Ævar Arnfjörð Bjarmason @ 2023-02-01 13:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, René Scharfe
In-Reply-To: <Y9pgG10dAoQABGXG@coredump.intra.peff.net>


On Wed, Feb 01 2023, Jeff King wrote:

> On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote:
>
>> @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate,
>>  		}
>>  	}
>>  	if (flags & HASH_FORMAT_CHECK) {
>> -		if (type == OBJ_TREE)
>> -			check_tree(buf, size);
>> -		if (type == OBJ_COMMIT)
>> -			check_commit(buf, size);
>> -		if (type == OBJ_TAG)
>> -			check_tag(buf, size);
>> +		struct fsck_options opts = FSCK_OPTIONS_DEFAULT;
>> +
>> +		opts.strict = 1;
>> +		opts.error_func = hash_format_check_report;
>> +		if (fsck_buffer(null_oid(), type, buf, size, &opts))
>> +			die(_("refusing to create malformed object"));
>> +		fsck_finish(&opts);
>>  	}
>
> By the way, I wanted to call out one thing here that nobody mentioned
> during review: we are not checking the return value of fsck_finish().
>
> That is a bit of a weird function. We must call it because it cleans up
> any resources allocated during the fsck_buffer() call. But it also is
> the last chance to fsck any special blobs (like those that are found as
> .gitmodules, etc). We only find out the filenames while looking at the
> enclosing trees, so we queue them and then check the blobs later.
>
> So if we are hashing a blob, that is mostly fine. We will not have the
> blob's name queued as anything special, and so the fsck is a noop.
>
> But if we fsck a tree, and it has a .gitmodules entry pointing to blob
> X, then we would also pull X from the odb and fsck it during this
> "finish" phase.
>
> Which leads me to two diverging lines of thought:
>
>   1. One of my goals with this series is that one could add objects to
>      the repository via "git hash-object -w" and feel confident that no
>      fsck rules were violated, because fsck implements some security
>      checks. In the past when GitHub rolled out security checks this was
>      a major pain, because objects enter repositories not just from
>      pushes, but also from web-client activity (e.g., editing a blob on
>      the website). And since Git had no way to say "fsck just this
>      object", we ended up implementing the fsck checks multiple times,
>      in libgit2 and in some of its calling code.
>
>      So I was hoping that just passing the objects to "hash-object"
>      would be a viable solution. I'm not sure if it is or not. If you
>      just hash a blob, then we'll have no clue it's a .gitmodules file.
>      OTOH, you have to get the matching tree which binds the blob to the
>      .gitmodules path somehow. So if that tree is fsck'd, and then
>      checks the blob during fsck_finish(), that should be enough.
>      Assuming that fsck complains when the pointed-to blob cannot be
>      accessed, which I think it should (because really, incremental
>      pushes face the same problem).
>
>      In which case we really ought to be checking the result of
>      fsck_finish() here and complaining.
>
>   2. We're not checking fsck connectivity here, and that's intentional.
>      So you can "hash-object" a tree that points to blobs that we don't
>      actually have. But if you hash a tree that points a .gitmodules
>      entry at a blob that doesn't exist, then that will fail the fsck
>      (during the finish step). And respecting the fsck_finish() exit
>      code would break that.
>
>      As an addendum, in a regular fsck, many trees might mention the
>      same blob as .gitmodules, and we'll queue that blob to be checked
>      once. But here, we are potentially running a bunch of individual
>      fscks, one per object we hash. So if you had, say, 1000 trees that
>      all mentioned the same blob (because other entries were changing),
>      and you tried to hash them all with "hash-object --stdin-paths" or
>      similar, then we'd fsck that blob 1000 times.
>
>      Which isn't wrong, per se, but seems inefficient. Solving it would
>      require keeping track of what has been checked between calls to
>      index_mem(). Which is kind of awkward, seeing as how low-level it
>      is. It would be a lot more natural if all this checking happened in
>      hash-object itself.
>
> So I dunno. The code above is doing (2), albeit with the inefficiency of
> checking blobs that we might not care about. I kind of think (1) is the
> right thing, though, and anybody who really wants to make trees that
> point to bogus .gitmodules can use --literally.

Aside from the other things you bring up here, it seems wrong to me to
conflate --literally with some sort of "no fsck" or "don't fsck this
collection yet" mode.

Can't we have a "--no-fsck" or similar, which won't do any sort of full
fsck, but also won't accept bogus object types & the like?

Currently I believe (and I haven't had time to carefully review what you
have here) we only need --literally to produce objects that are truly
corrupt when viewed in isolation. E.g. a tag that refers to a bogus
object type etc.

But we have long supported a narrow view of what the fsck checks mean in
that context. E.g. now with "mktag" we'll use the fsck machinery, but
only skin-deep, so you can be referring to a tree which would in turn
fail our checks.

I tend to think that we should be keeping it like that, but documenting
that if you're creating such objects you either need to do it really
carefully, or follow it up with an operation that's guaranteed to fsck
the sum of the objects you've added recursively.

So, rather than teach e.g. "hash-object" to be smart about that we
should e.g. encourage a manual creation of trees/blobs/commits to be
followed-up with a "git push" to a new ref that refers to them, even if
that "git push" is to the repository located in the $PWD.

By doing that we offload the "what's new?" question to the
pack-over-the-wire machinery, which is well tested.

Anything else seems ultimately to be madness, after all if I feed a
newly crafted commit to "hash-object" how do we know where to stop,
other than essentially faking up a push negotiation with ourselves?

It's also worth noting that much of the complexity around .gitmodules in
particular is to support packfile-uri's odd notion of applying the
"last" part of the PACK before the "first" part, which nothing else
does.

Which, if we just blindly applied both, and then fsck'd the resulting
combination we'd get rid of that tricky special-case. But I haven't
benchmarked that. It should be a bit slower, particularly on a large
repository that won't fit in memory. But my hunch is that it won't be
too bad, and the resulting simplification may be worth it (particularly
now that we have bundle-uri, which doesn't share that edge-case).


^ permalink raw reply

* Re: Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: Bezdeka, Florian @ 2023-02-01 13:05 UTC (permalink / raw)
  To: peff@peff.net
  Cc: git@vger.kernel.org, gitster@pobox.com, greg.pflaum@pnp-hcl.com,
	jacob.keller@gmail.com, gerhard@dest-unreach.org,
	sandals@crustytoothpaste.net
In-Reply-To: <494ac71b378b1afb4349a4fb86767f7f77e781b3.camel@siemens.com>

On Wed, 2023-02-01 at 13:53 +0100, Florian Bezdeka wrote:
> On Wed, 2023-02-01 at 07:28 -0500, Jeff King wrote:
> > On Wed, Feb 01, 2023 at 12:19:55AM +0100, Florian Bezdeka wrote:
> > 
> > > > Junio pointed out the excellent analysis from Peff regarding the
> > > > situation and the fact that socat is wrong here.
> > > 
> > > Thanks for pointing me to the old discussion. I was quite sure that I'm
> > > not the first one facing this problem but couldn't find something.
> > > 
> > > It might be that socat is doing something wrong. But git is the
> > > component that triggers the problem. Did someone talk to the socat
> > > maintainers yet?
> > 
> > I'm not sure I'd say that socat is wrong. It's a generic tool, and it
> > doesn't know what type of protocol the two sides are expecting, or how
> > they'll handle half-duplex shutdowns. The default behavior is to wait
> > 0.5 seconds to see if the other side has anything to say, which is a
> > reasonable compromise. It's just not enough for use a Git proxy in this
> > case.
> > 
> > The ideal, of course, would be an option to send the half-duplex
> > shutdown to the server and then wait for the server to hang up. But I
> > don't think it has such an option (you can just simulate it with a
> > really large "-t"). Netcat does, FWIW ("-q -1").
> 
> -t doesn't help here. With massive help from the socat maintainer
> (thanks Gerhard!, now in CC) I was able to get the following log out of
> socat:
> 
> 2023/02/01 11:06:29.960194 socat[18916] D read(0, 0x56111c858000, 8192)
> 2023/02/01 11:06:29.960208 socat[18916] D read -> 0
> 
> stdin had EOF. Socat half closes the socket:
> 
> 2023/02/01 11:06:29.960231 socat[18916] I shutdown(6, 1)
> 
> And then, within less than 0.2s, the peer (proxy?) closes the other
> channel:
> 
> 2023/02/01 11:06:30.118216 socat[18916] D read(6, 0x56111c858000, 8192)
> 2023/02/01 11:06:30.118238 socat[18916] D read -> 0
> 
> It's quite clear now that the remote peer (proxy or server) closes the
> complete connection after receiving the partial shutdown. That's
> nothing that is under my control.
> 
> With privoxy and the infrastructure at work (zscaler based) there are
> at least two proxy implementations showing this behavior. 
> 
> Switching to ncat --no-shutdown qualifies as workaround for now, but so
> far I didn't manage to get socat back into the game. Downgrading git is
> the other possibility.

With another hint from Gerhard:

Using ignoreeof works!

My proxy script does now (for one of the test scenarios):

socat STDIO,ignoreeof PROXY:<proxy-dns-name>:git.code.sf.net:9418,proxyport=9400
           ^
           This is new
> 
> > 
> > > Peff also mentioned that the half-duplex shutdown of the socket is
> > > inconsistent between proxy and raw TCP git://. It seems still a valid
> > > option to skip the half-shutdown for the git:// proxy scenario.
> > 
> > It could be done, but that would reintroduce the "oops, socat died while
> > we were waiting" that ae1a7eefff was solving. The original motivation
> > was with ssh, but the same problem exists for proxies. It _doesn't_
> > exist for raw TCP, because nobody notices the connection died (we just
> > close() it), and there's no error to propagate.
> > 
> > The raw TCP version does still suffer from leaving the connection open
> > unnecessarily, so it would benefit from getting the same treatment. I
> > didn't care enough to implement it (and TBH, I kind of hoped that git://
> > was on the decline; especially with the v2 protocol, it's pretty much
> > worse in every way than git-over-http).
> > 
> > > > What value of -t did you try?
> > > 
> > > I was playing with -t 10 and -t 60 so far. Both does not work for
> > > cloning a kernel stable tree. I guess it's hard to find a value that
> > > works under all circumstances as timings might be different depending
> > > on server/network speed.
> > 
> > Anything over "5" should be sufficient, because the other side should be
> > sending keep-alive packets (at the Git protocol level) every 5 seconds.
> > It might be worth running socat under strace to see what it's seeing and
> > doing.
> > 
> > Another workaround is to set protocol.version to "0" in your Git config.
> > 
> > -Peff
> 


^ permalink raw reply

* Re: Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: Florian Bezdeka @ 2023-02-01 12:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Jacob Keller, brian m. carlson, git@vger.kernel.org,
	gitster@pobox.com, greg.pflaum@pnp-hcl.com, Gerhard Rieger
In-Reply-To: <Y9pa/YHnrrMU/ufV@coredump.intra.peff.net>

On Wed, 2023-02-01 at 07:28 -0500, Jeff King wrote:
> On Wed, Feb 01, 2023 at 12:19:55AM +0100, Florian Bezdeka wrote:
> 
> > > Junio pointed out the excellent analysis from Peff regarding the
> > > situation and the fact that socat is wrong here.
> > 
> > Thanks for pointing me to the old discussion. I was quite sure that I'm
> > not the first one facing this problem but couldn't find something.
> > 
> > It might be that socat is doing something wrong. But git is the
> > component that triggers the problem. Did someone talk to the socat
> > maintainers yet?
> 
> I'm not sure I'd say that socat is wrong. It's a generic tool, and it
> doesn't know what type of protocol the two sides are expecting, or how
> they'll handle half-duplex shutdowns. The default behavior is to wait
> 0.5 seconds to see if the other side has anything to say, which is a
> reasonable compromise. It's just not enough for use a Git proxy in this
> case.
> 
> The ideal, of course, would be an option to send the half-duplex
> shutdown to the server and then wait for the server to hang up. But I
> don't think it has such an option (you can just simulate it with a
> really large "-t"). Netcat does, FWIW ("-q -1").

-t doesn't help here. With massive help from the socat maintainer
(thanks Gerhard!, now in CC) I was able to get the following log out of
socat:

2023/02/01 11:06:29.960194 socat[18916] D read(0, 0x56111c858000, 8192)
2023/02/01 11:06:29.960208 socat[18916] D read -> 0

stdin had EOF. Socat half closes the socket:

2023/02/01 11:06:29.960231 socat[18916] I shutdown(6, 1)

And then, within less than 0.2s, the peer (proxy?) closes the other
channel:

2023/02/01 11:06:30.118216 socat[18916] D read(6, 0x56111c858000, 8192)
2023/02/01 11:06:30.118238 socat[18916] D read -> 0

It's quite clear now that the remote peer (proxy or server) closes the
complete connection after receiving the partial shutdown. That's
nothing that is under my control.

With privoxy and the infrastructure at work (zscaler based) there are
at least two proxy implementations showing this behavior. 

Switching to ncat --no-shutdown qualifies as workaround for now, but so
far I didn't manage to get socat back into the game. Downgrading git is
the other possibility.

> 
> > Peff also mentioned that the half-duplex shutdown of the socket is
> > inconsistent between proxy and raw TCP git://. It seems still a valid
> > option to skip the half-shutdown for the git:// proxy scenario.
> 
> It could be done, but that would reintroduce the "oops, socat died while
> we were waiting" that ae1a7eefff was solving. The original motivation
> was with ssh, but the same problem exists for proxies. It _doesn't_
> exist for raw TCP, because nobody notices the connection died (we just
> close() it), and there's no error to propagate.
> 
> The raw TCP version does still suffer from leaving the connection open
> unnecessarily, so it would benefit from getting the same treatment. I
> didn't care enough to implement it (and TBH, I kind of hoped that git://
> was on the decline; especially with the v2 protocol, it's pretty much
> worse in every way than git-over-http).
> 
> > > What value of -t did you try?
> > 
> > I was playing with -t 10 and -t 60 so far. Both does not work for
> > cloning a kernel stable tree. I guess it's hard to find a value that
> > works under all circumstances as timings might be different depending
> > on server/network speed.
> 
> Anything over "5" should be sufficient, because the other side should be
> sending keep-alive packets (at the Git protocol level) every 5 seconds.
> It might be worth running socat under strace to see what it's seeing and
> doing.
> 
> Another workaround is to set protocol.version to "0" in your Git config.
> 
> -Peff


^ permalink raw reply

* Re: [PATCH 6/6] hash-object: use fsck for object checks
From: Jeff King @ 2023-02-01 12:50 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, René Scharfe,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <Y8haHL9xIWntSm0/@coredump.intra.peff.net>

On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote:

> @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate,
>  		}
>  	}
>  	if (flags & HASH_FORMAT_CHECK) {
> -		if (type == OBJ_TREE)
> -			check_tree(buf, size);
> -		if (type == OBJ_COMMIT)
> -			check_commit(buf, size);
> -		if (type == OBJ_TAG)
> -			check_tag(buf, size);
> +		struct fsck_options opts = FSCK_OPTIONS_DEFAULT;
> +
> +		opts.strict = 1;
> +		opts.error_func = hash_format_check_report;
> +		if (fsck_buffer(null_oid(), type, buf, size, &opts))
> +			die(_("refusing to create malformed object"));
> +		fsck_finish(&opts);
>  	}

By the way, I wanted to call out one thing here that nobody mentioned
during review: we are not checking the return value of fsck_finish().

That is a bit of a weird function. We must call it because it cleans up
any resources allocated during the fsck_buffer() call. But it also is
the last chance to fsck any special blobs (like those that are found as
.gitmodules, etc). We only find out the filenames while looking at the
enclosing trees, so we queue them and then check the blobs later.

So if we are hashing a blob, that is mostly fine. We will not have the
blob's name queued as anything special, and so the fsck is a noop.

But if we fsck a tree, and it has a .gitmodules entry pointing to blob
X, then we would also pull X from the odb and fsck it during this
"finish" phase.

Which leads me to two diverging lines of thought:

  1. One of my goals with this series is that one could add objects to
     the repository via "git hash-object -w" and feel confident that no
     fsck rules were violated, because fsck implements some security
     checks. In the past when GitHub rolled out security checks this was
     a major pain, because objects enter repositories not just from
     pushes, but also from web-client activity (e.g., editing a blob on
     the website). And since Git had no way to say "fsck just this
     object", we ended up implementing the fsck checks multiple times,
     in libgit2 and in some of its calling code.

     So I was hoping that just passing the objects to "hash-object"
     would be a viable solution. I'm not sure if it is or not. If you
     just hash a blob, then we'll have no clue it's a .gitmodules file.
     OTOH, you have to get the matching tree which binds the blob to the
     .gitmodules path somehow. So if that tree is fsck'd, and then
     checks the blob during fsck_finish(), that should be enough.
     Assuming that fsck complains when the pointed-to blob cannot be
     accessed, which I think it should (because really, incremental
     pushes face the same problem).

     In which case we really ought to be checking the result of
     fsck_finish() here and complaining.

  2. We're not checking fsck connectivity here, and that's intentional.
     So you can "hash-object" a tree that points to blobs that we don't
     actually have. But if you hash a tree that points a .gitmodules
     entry at a blob that doesn't exist, then that will fail the fsck
     (during the finish step). And respecting the fsck_finish() exit
     code would break that.

     As an addendum, in a regular fsck, many trees might mention the
     same blob as .gitmodules, and we'll queue that blob to be checked
     once. But here, we are potentially running a bunch of individual
     fscks, one per object we hash. So if you had, say, 1000 trees that
     all mentioned the same blob (because other entries were changing),
     and you tried to hash them all with "hash-object --stdin-paths" or
     similar, then we'd fsck that blob 1000 times.

     Which isn't wrong, per se, but seems inefficient. Solving it would
     require keeping track of what has been checked between calls to
     index_mem(). Which is kind of awkward, seeing as how low-level it
     is. It would be a lot more natural if all this checking happened in
     hash-object itself.

So I dunno. The code above is doing (2), albeit with the inefficiency of
checking blobs that we might not care about. I kind of think (1) is the
right thing, though, and anybody who really wants to make trees that
point to bogus .gitmodules can use --literally.

-Peff

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: demerphq @ 2023-02-01 12:48 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
	Konstantin Ryabitsev, Eli Schwartz, Git List
In-Reply-To: <20230201122152.GJ19419@kitsune.suse.cz>

On Wed, 1 Feb 2023, 20:21 Michal Suchánek, <msuchanek@suse.de> wrote:
>
> On Wed, Feb 01, 2023 at 12:34:06PM +0100, demerphq wrote:
> > Why does it have to be gzip? It is not that hard to come up with a

> historical reasons?

Currently git doesn't advertise that archive creation is stable
right[1]? So I wrote that with the assumption that this new
compression would only be used when making a new archive with a
hypothetical new '--stable' option. So historical reasons don't come
up. Or was there some other form of history that you meant?

I'm just trying to point out here that stable compression is doable
and doesn't need to be as complex as specifying a stable gzip format.
I am not even saying git should just do this, just that it /could/ if
it decided that stability was important, and that doing so wouldn't
involve the complexity that Avar was implying would be needed.  Simple
compression like LZ variants are pretty straightforward to implement,
achieve pretty good compression and can run pretty fast.

Yves
[1] if it did the issue kicking off this thread would not have
happened as there would be a test that would have noticed the change.

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Ævar Arnfjörð Bjarmason @ 2023-02-01 12:42 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Eli Schwartz, Git List
In-Reply-To: <Y9jlWYLzZ/yy4NqD@tapette.crustytoothpaste.net>


On Tue, Jan 31 2023, brian m. carlson wrote:

> Since then, I've been very opposed to us guaranteeing output format
> consistency without explicitly doing so.  I had sent some patches before
> that I don't think ever got picked up that documented this explicitly.
> I very much don't want people to come to rely on our behaviour unless we
> explicitly guarantee it.

FWIW I think the reason that didn't get picked up (I went back and read
the discussion) is that there was some feedback on the v1, [1] suggested
(at least to me) that you'd re-roll it, but that re-roll never seems to
have made it to the list.

1. https://lore.kernel.org/git/YD7aDwX%2FaiRN0GZs@camp.crustytoothpaste.net/

^ permalink raw reply

* Re: Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: Jeff King @ 2023-02-01 12:28 UTC (permalink / raw)
  To: Florian Bezdeka
  Cc: Jacob Keller, brian m. carlson, git@vger.kernel.org,
	gitster@pobox.com, greg.pflaum@pnp-hcl.com
In-Reply-To: <840bbd91453529571a9d4f13472a12f6e472d198.camel@siemens.com>

On Wed, Feb 01, 2023 at 12:19:55AM +0100, Florian Bezdeka wrote:

> > Junio pointed out the excellent analysis from Peff regarding the
> > situation and the fact that socat is wrong here.
> 
> Thanks for pointing me to the old discussion. I was quite sure that I'm
> not the first one facing this problem but couldn't find something.
> 
> It might be that socat is doing something wrong. But git is the
> component that triggers the problem. Did someone talk to the socat
> maintainers yet?

I'm not sure I'd say that socat is wrong. It's a generic tool, and it
doesn't know what type of protocol the two sides are expecting, or how
they'll handle half-duplex shutdowns. The default behavior is to wait
0.5 seconds to see if the other side has anything to say, which is a
reasonable compromise. It's just not enough for use a Git proxy in this
case.

The ideal, of course, would be an option to send the half-duplex
shutdown to the server and then wait for the server to hang up. But I
don't think it has such an option (you can just simulate it with a
really large "-t"). Netcat does, FWIW ("-q -1").

> Peff also mentioned that the half-duplex shutdown of the socket is
> inconsistent between proxy and raw TCP git://. It seems still a valid
> option to skip the half-shutdown for the git:// proxy scenario.

It could be done, but that would reintroduce the "oops, socat died while
we were waiting" that ae1a7eefff was solving. The original motivation
was with ssh, but the same problem exists for proxies. It _doesn't_
exist for raw TCP, because nobody notices the connection died (we just
close() it), and there's no error to propagate.

The raw TCP version does still suffer from leaving the connection open
unnecessarily, so it would benefit from getting the same treatment. I
didn't care enough to implement it (and TBH, I kind of hoped that git://
was on the decline; especially with the v2 protocol, it's pretty much
worse in every way than git-over-http).

> > What value of -t did you try?
> 
> I was playing with -t 10 and -t 60 so far. Both does not work for
> cloning a kernel stable tree. I guess it's hard to find a value that
> works under all circumstances as timings might be different depending
> on server/network speed.

Anything over "5" should be sufficient, because the other side should be
sending keep-alive packets (at the Git protocol level) every 5 seconds.
It might be worth running socat under strace to see what it's seeing and
doing.

Another workaround is to set protocol.version to "0" in your Git config.

-Peff

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Raymond E. Pasco @ 2023-02-01 12:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, brian m. carlson
  Cc: Konstantin Ryabitsev, Eli Schwartz, Git List
In-Reply-To: <230201.86pmatr9mj.gmgdl@evledraar.gmail.com>

February 1, 2023 4:40 AM, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> wrote:
> As people have come to rely on the exact "deflate"
> implementation "git archive" promises to invoke the system's
> "gzip" binary by default, under the assumption that its output
> is stable. If that's no longer the case you'll need to complain
> to whoever maintains your local "gzip".

Surely if reproducibility of .tar.gz files is the goal,"invoke
whatever arbitrary binary on $PATH happens to be called gzip" is an
poor solution.

It is only even possible to consider stabilizing gzip output as a
goal for Git (although this seems ill-advised for the reasons
Brian already discussed) in the post-2.38 world where git is
doing the gzipping.

If one has the requirement to substitute one's own specific
compressor, there is an option for that.

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Michal Suchánek @ 2023-02-01 12:21 UTC (permalink / raw)
  To: demerphq
  Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
	Konstantin Ryabitsev, Eli Schwartz, Git List
In-Reply-To: <CANgJU+V0QRFwmTh8ZzY=28kmbUw=DvSLE24LioOXp6_ozq+RdA@mail.gmail.com>

On Wed, Feb 01, 2023 at 12:34:06PM +0100, demerphq wrote:
> On Wed, 1 Feb 2023 at 11:26, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > That would be going above & beyond what's needed IMO, but still a lot
> > easier than the daunting task of writing a specification that exactly
> > described GNU gzip's current behavior, to the point where you could
> > clean-room implement it and be guaranteed byte-for-byte compatibility.
> 
> Why does it have to be gzip? It is not that hard to come up with a
historical reasons?

^ permalink raw reply

* Re: [PATCH v2] credential: new attribute password_expiry_utc
From: Jeff King @ 2023-02-01 12:10 UTC (permalink / raw)
  To: M Hickford via GitGitGadget
  Cc: git, Eric Sunshine, Cheetham, Dennington, M Hickford
In-Reply-To: <pull.1443.v2.git.git.1675244392025.gitgitgadget@gmail.com>

On Wed, Feb 01, 2023 at 09:39:51AM +0000, M Hickford via GitGitGadget wrote:

> +`password_expiry_utc`::
> +
> +	If password is a personal access token or OAuth access token, it may have an
> +	expiry date. When getting credentials from a helper, `git credential fill`
> +	ignores the password attribute if the expiry date has passed. Storage
> +	helpers should store this attribute if possible. Helpers should not
> +	implement expiry logic themselves. Represented as Unix time UTC, seconds
> +	since 1970.

This "should not" seems weird to me. The logic you have here throws out
entries that have expired when they pass through Git. But wouldn't
helpers which store things want to know about and act on the expiration,
too?

For example, if Git learns about a credential that expires in 60
seconds and passes it to credential-cache which is configured
--timeout=300, wouldn't it want to set its internal ttl on the
credential to 60, rather than 300?

I think your plan here is that Git would then reject the credential if a
request is made at time now+65. But the cache is holding onto it much
longer than necessary.

Likewise, wouldn't anything that stores credentials at least want to be
able to store and regurgitate the expiration? For instance, even
credential-store would want to do this. I'm OK if it doesn't, and we can
consider it a quality-of-implementation issue and see if anybody cares
enough to implement it. But I'd think most "real" helpers would want to
do so.

So it seems like helpers really do need to support this "expiration"
notion. And it's actually Git itself which doesn't need to care about
it, assuming the helpers are doing something sensible (though it is OK
if Git _also_ throws away expired credentials to support helpers which
don't).

> diff --git a/builtin/credential-cache--daemon.c b/builtin/credential-cache--daemon.c
> index f3c89831d4a..338058be7f9 100644
> --- a/builtin/credential-cache--daemon.c
> +++ b/builtin/credential-cache--daemon.c
> @@ -127,6 +127,9 @@ static void serve_one_client(FILE *in, FILE *out)
>  		if (e) {
>  			fprintf(out, "username=%s\n", e->item.username);
>  			fprintf(out, "password=%s\n", e->item.password);
> +			if (e->item.password_expiry_utc != TIME_MAX)
> +				fprintf(out, "password_expiry_utc=%"PRItime"\n",
> +					e->item.password_expiry_utc);
>  		}

Is there a particular reason to use TIME_MAX as the sentinel value here,
and not just "0"? It's not that big a deal either way, but it's more
usual in our code base to use "0" if there's no reason not to (and it
seems like nothing should be expiring in 1970 these days).

> @@ -195,15 +196,20 @@ static void credential_getpass(struct credential *c)
>  	if (!c->username)
>  		c->username = credential_ask_one("Username", c,
>  						 PROMPT_ASKPASS|PROMPT_ECHO);
> -	if (!c->password)
> +	if (!c->password || c->password_expiry_utc < time(NULL)) {

This is comparing a timestamp_t to a time_t, which may mix
signed/unsigned. I can't offhand think of anything that would go too
wrong there before 2038, so it's probably OK, but I wanted to call it
out.

> @@ -225,6 +231,7 @@ int credential_read(struct credential *c, FILE *fp)
>  		} else if (!strcmp(key, "password")) {
>  			free(c->password);
>  			c->password = xstrdup(value);
> +			password_updated = 1;
>  		} else if (!strcmp(key, "protocol")) {
>  			free(c->protocol);
>  			c->protocol = xstrdup(value);
> @@ -234,6 +241,11 @@ int credential_read(struct credential *c, FILE *fp)
>  		} else if (!strcmp(key, "path")) {
>  			free(c->path);
>  			c->path = xstrdup(value);
> +		} else if (!strcmp(key, "password_expiry_utc")) {
> +			this_password_expiry = parse_timestamp(value, NULL, 10);
> +			if (this_password_expiry == 0 || errno) {
> +				this_password_expiry = TIME_MAX;
> +			}
>  		} else if (!strcmp(key, "url")) {
>  			credential_from_url(c, value);
>  		} else if (!strcmp(key, "quit")) {
> @@ -246,6 +258,9 @@ int credential_read(struct credential *c, FILE *fp)
>  		 */
>  	}
>  
> +	if (password_updated)
> +		c->password_expiry_utc = this_password_expiry;

Do we need this logic? It seems weird that a helper would output an
expiration but not a password in the first place. I guess ignoring the
expiration is probably a reasonable outcome, but I wonder if a helper
would ever want to just add an expiration to the data coming from
another helper.

I.e., could we just read the value directly into c->password_expiry_utc
as we do with other fields?

-Peff

^ permalink raw reply

* Re: Inconsistency between git-log documentation and behavior regarding default date format.
From: Jeff King @ 2023-02-01 11:51 UTC (permalink / raw)
  To: Andrei Rybak; +Cc: Rafael Dulfer, git
In-Reply-To: <8d72f484-6a72-87f9-44f1-3a38471201db@gmail.com>

On Wed, Feb 01, 2023 at 02:33:49AM +0100, Andrei Rybak wrote:

> Indeed, the description of the option (which comes from
> Documentation/rev-list-options.txt) doesn't describe all differences
> between --date=default and --date=rfc2822.
> 
> A fuller list could be:
> 
>     * There is no comma after the day-of-week
>     * The time zone is omitted when the local time zone is used
>     * Day-of-month and month are switched around
>     * Time-of-day and the year are switched around
> 
> CC'ing Peff, who wrote the list of exceptions in add00ba2de (date: make
> "local" orthogonal to date format, 2015-09-03).

I don't think I had any particular reason to omit those other
differences. I probably just didn't notice them. Patches welcome.

-Peff

^ permalink raw reply

* [PATCH 4/4] t/lib-httpd: increase ssl key size to 2048 bits
From: Jeff King @ 2023-02-01 11:39 UTC (permalink / raw)
  To: git; +Cc: Todd Zullinger
In-Reply-To: <Y9pOmR5fOfCHwYpF@coredump.intra.peff.net>

Recent versions of openssl will refuse to work with 1024-bit RSA keys,
as they are considered insecure. I didn't track down the exact version
in which the defaults were tightened, but the Debian-package openssl 3.0
on my system yields:

  $ LIB_HTTPD_SSL=1 ./t5551-http-fetch-smart.sh -v -i
  [...]
  SSL Library Error: error:0A00018F:SSL routines::ee key too small
  1..0 # SKIP web server setup failed

This could probably be overcome with configuration, but that's likely
to be a headache (especially if it requires touching /etc/openssl).
Let's just pick a key size that's less outrageously out of date.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd/ssl.cnf | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/lib-httpd/ssl.cnf b/t/lib-httpd/ssl.cnf
index 6dab2579cb..812e8253f0 100644
--- a/t/lib-httpd/ssl.cnf
+++ b/t/lib-httpd/ssl.cnf
@@ -1,7 +1,7 @@
 RANDFILE                = $ENV::RANDFILE_PATH
 
 [ req ]
-default_bits            = 1024
+default_bits            = 2048
 distinguished_name      = req_distinguished_name
 prompt                  = no
 [ req_distinguished_name ]
-- 
2.39.1.767.gb4615b3bd3

^ permalink raw reply related

* [PATCH 3/4] t/lib-httpd: drop SSLMutex config
From: Jeff King @ 2023-02-01 11:39 UTC (permalink / raw)
  To: git; +Cc: Todd Zullinger
In-Reply-To: <Y9pOmR5fOfCHwYpF@coredump.intra.peff.net>

The SSL config enabled by setting LIB_HTTPD_SSL does not work with
Apache versions greater than 2.2, as more recent versions complain about
the SSLMutex directive. According to
https://httpd.apache.org/docs/current/upgrading.html:

  Directives AcceptMutex, LockFile, RewriteLock, SSLMutex,
  SSLStaplingMutex, and WatchdogMutexPath have been replaced with a
  single Mutex directive. You will need to evaluate any use of these
  removed directives in your 2.2 configuration to determine if they can
  just be deleted or will need to be replaced using Mutex.

Deleting this line will just use the system default, which seems
sensible. The original came as part of faa4bc35a0 (http-push: add
regression tests, 2008-02-27), but no specific reason is given there (or
on the mailing list) for its presence.

Signed-off-by: Jeff King <peff@peff.net>
---
If we can't drop support for 2.2 in patch 2, then this would need to be
adjusted with an IfVersion.

 t/lib-httpd/apache.conf | 1 -
 1 file changed, 1 deletion(-)

diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 332617f10d..51a4fbcf62 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -203,7 +203,6 @@ SSLCertificateKeyFile httpd.pem
 SSLRandomSeed startup file:/dev/urandom 512
 SSLRandomSeed connect file:/dev/urandom 512
 SSLSessionCache none
-SSLMutex file:ssl_mutex
 SSLEngine On
 </IfDefine>
 
-- 
2.39.1.767.gb4615b3bd3


^ permalink raw reply related

* [PATCH 2/4] t/lib-httpd: bump required apache version to 2.4
From: Jeff King @ 2023-02-01 11:38 UTC (permalink / raw)
  To: git; +Cc: Todd Zullinger
In-Reply-To: <Y9pOmR5fOfCHwYpF@coredump.intra.peff.net>

Apache 2.4 has been out since early 2012, almost 11 years. And its
predecessor, 2.2, has been out of support since its last release in
2017, over 5 years ago. The last mention on the mailing list was from
around the same time, in this thread:

  https://lore.kernel.org/git/20171231023234.21215-1-tmz@pobox.com/

We can probably assume that 2.4 is available everywhere. And the stakes
are fairly low, as the worst case is that such a platform would skip the
http tests.

This lets us clean up a few minor version checks in the config file, but
also revert f1f2b45be0 (tests: adjust the configuration for Apache 2.2,
2016-05-09). Its technique isn't _too_ bad, but certainly required a bit
more explanation than the 2.4 version it replaced. I manually confirmed
that the test in t5551 still behaves as expected (if you replace
"cadabra" with "foo", the server correctly rejects the request).

It will also help future patches which will no longer have to deal with
conditional config for this old version.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh          |  4 ++--
 t/lib-httpd/apache.conf | 22 ++++------------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 8fc411ff41..5d2d56c445 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -108,10 +108,10 @@ then
 	if test -z "$LIB_HTTPD_MODULE_PATH"
 	then
 		if ! test "$HTTPD_VERSION_MAJOR" -eq 2 ||
-		   ! test "$HTTPD_VERSION_MINOR" -ge 2
+		   ! test "$HTTPD_VERSION_MINOR" -ge 4
 		then
 			test_skip_or_die GIT_TEST_HTTPD \
-				"at least Apache version 2.2 is required"
+				"at least Apache version 2.4 is required"
 		fi
 		if ! test -d "$DEFAULT_HTTPD_MODULE_PATH"
 		then
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 35f5e28507..332617f10d 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -34,10 +34,6 @@ LoadModule http2_module modules/mod_http2.so
 Protocols h2c
 </IfDefine>
 
-<IfVersion < 2.4>
-LockFile accept.lock
-</IfVersion>
-
 <IfModule !mod_auth_basic.c>
 	LoadModule auth_basic_module modules/mod_auth_basic.so
 </IfModule>
@@ -51,7 +47,6 @@ LockFile accept.lock
 	LoadModule authz_host_module modules/mod_authz_host.so
 </IfModule>
 
-<IfVersion >= 2.4>
 <IfModule !mod_authn_core.c>
 	LoadModule authn_core_module modules/mod_authn_core.so
 </IfModule>
@@ -75,7 +70,6 @@ LockFile accept.lock
 	LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
 </IfModule>
 </IfDefine>
-</IfVersion>
 
 PassEnv GIT_VALGRIND
 PassEnv GIT_VALGRIND_OPTIONS
@@ -115,6 +109,10 @@ Alias /auth/dumb/ www/auth/dumb/
 	Header set Set-Cookie name=value
 </LocationMatch>
 <LocationMatch /smart_headers/>
+	<RequireAll>
+		Require expr %{HTTP:x-magic-one} == 'abra'
+		Require expr %{HTTP:x-magic-two} == 'cadabra'
+	</RequireAll>
 	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
 	SetEnv GIT_HTTP_EXPORT_ALL
 </LocationMatch>
@@ -197,18 +195,6 @@ RewriteRule ^/intern-redir/(.*)/foo$ /smart/$1 [PT]
 RewriteRule ^/redir-objects/(.*/info/refs)$ /dumb/$1 [PT]
 RewriteRule ^/redir-objects/(.*/objects/.*)$ /dumb/$1 [R=301]
 
-# Apache 2.2 does not understand <RequireAll>, so we use RewriteCond.
-# And as RewriteCond does not allow testing for non-matches, we match
-# the desired case first (one has abra, two has cadabra), and let it
-# pass by marking the RewriteRule as [L], "last rule, do not process
-# any other matching RewriteRules after this"), and then have another
-# RewriteRule that matches all other cases and lets them fail via '[F]',
-# "fail the request".
-RewriteCond %{HTTP:x-magic-one} =abra
-RewriteCond %{HTTP:x-magic-two} =cadabra
-RewriteRule ^/smart_headers/.* - [L]
-RewriteRule ^/smart_headers/.* - [F]
-
 <IfDefine SSL>
 LoadModule ssl_module modules/mod_ssl.so
 
-- 
2.39.1.767.gb4615b3bd3


^ permalink raw reply related

* [PATCH 1/4] t/lib-httpd: bump required apache version to 2.2
From: Jeff King @ 2023-02-01 11:37 UTC (permalink / raw)
  To: git; +Cc: Todd Zullinger
In-Reply-To: <Y9pOmR5fOfCHwYpF@coredump.intra.peff.net>

Apache 2.2 was released in 2005, almost 18 years ago. We can probably
assume that people are running a version at least that old (and the
stakes for removing it are fairly low, as the worst case is that they
would not run the http tests against their ancient version).

Dropping support for the older versions cleans up the config file a
little, and will also enable us to bump the required version further
(with more cleanups) in a future patch.

Note that the file actually checks for version 2.1. In apache's
versioning scheme, odd numbered versions are for development and even
numbers are for stable releases. So 2.1 and 2.2 are effectively the same
from our perspective.

Older versions would just fail to start, which would generally cause us
to skip the tests. However, we do have version detection code in
lib-httpd.sh which produces a nicer error message, so let's update that,
too. I didn't bother handling the case of "3.0", etc. Apache has been on
2.x for 21 years, with no signs of bumping the major version.  And if
they eventually do, I suspect there will be enough breaking changes that
we'd need to update more than just the numeric version check. We can
worry about that hypothetical when it happens.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh          | 11 +++++++----
 t/lib-httpd/apache.conf |  8 --------
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 608949ea80..8fc411ff41 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -99,16 +99,19 @@ then
 fi
 
 HTTPD_VERSION=$($LIB_HTTPD_PATH -v | \
-	sed -n 's/^Server version: Apache\/\([0-9]*\)\..*$/\1/p; q')
+	sed -n 's/^Server version: Apache\/\([0-9.]*\).*$/\1/p; q')
+HTTPD_VERSION_MAJOR=$(echo $HTTPD_VERSION | cut -d. -f1)
+HTTPD_VERSION_MINOR=$(echo $HTTPD_VERSION | cut -d. -f2)
 
-if test -n "$HTTPD_VERSION"
+if test -n "$HTTPD_VERSION_MAJOR"
 then
 	if test -z "$LIB_HTTPD_MODULE_PATH"
 	then
-		if ! test $HTTPD_VERSION -ge 2
+		if ! test "$HTTPD_VERSION_MAJOR" -eq 2 ||
+		   ! test "$HTTPD_VERSION_MINOR" -ge 2
 		then
 			test_skip_or_die GIT_TEST_HTTPD \
-				"at least Apache version 2 is required"
+				"at least Apache version 2.2 is required"
 		fi
 		if ! test -d "$DEFAULT_HTTPD_MODULE_PATH"
 		then
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 0294739a77..35f5e28507 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -38,13 +38,6 @@ Protocols h2c
 LockFile accept.lock
 </IfVersion>
 
-<IfVersion < 2.1>
-<IfModule !mod_auth.c>
-	LoadModule auth_module modules/mod_auth.so
-</IfModule>
-</IfVersion>
-
-<IfVersion >= 2.1>
 <IfModule !mod_auth_basic.c>
 	LoadModule auth_basic_module modules/mod_auth_basic.so
 </IfModule>
@@ -57,7 +50,6 @@ LockFile accept.lock
 <IfModule !mod_authz_host.c>
 	LoadModule authz_host_module modules/mod_authz_host.so
 </IfModule>
-</IfVersion>
 
 <IfVersion >= 2.4>
 <IfModule !mod_authn_core.c>
-- 
2.39.1.767.gb4615b3bd3


^ permalink raw reply related

* [PATCH 0/4] t/lib-httpd ssl fixes
From: Jeff King @ 2023-02-01 11:35 UTC (permalink / raw)
  To: git; +Cc: Todd Zullinger

While chasing down a possible HTTP/2 problem in our test suite (which
turns out not to be a Git bug, I think), I tried running the tests with
LIB_HTTPD_SSL=1, as TLS affects HTTP/2 upgrade.

Sadly, apache would not start at all with our ssl config. It looks like
this has probably been broken for many years, depending on your apache
and openssl versions.

The final two patches here fix ssl problems I found. The first two
patches drop support for older apache. This yields some minor cleanups,
and makes the ssl fixes slightly easier. I've cc'd Todd as the last
person to express support for Apache 2.2, in 2017. I'm hoping even
CentOS has moved on by now, but we'll see. :)

  [1/4]: t/lib-httpd: bump required apache version to 2.2
  [2/4]: t/lib-httpd: bump required apache version to 2.4
  [3/4]: t/lib-httpd: drop SSLMutex config
  [4/4]: t/lib-httpd: increase ssl key size to 2048 bits

 t/lib-httpd.sh          | 11 +++++++----
 t/lib-httpd/apache.conf | 31 ++++---------------------------
 t/lib-httpd/ssl.cnf     |  2 +-
 3 files changed, 12 insertions(+), 32 deletions(-)

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox