Git development
 help / color / mirror / Atom feed
* Re: [DISCUSS] Introducing Rust into the Git project
From: Elijah Newren @ 2024-01-11  2:21 UTC (permalink / raw)
  To: rsbecker; +Cc: Taylor Blau, Junio C Hamano, Dragan Simic, git
In-Reply-To: <008701da442f$b2dfe420$189fac60$@nexbridge.com>

On Wed, Jan 10, 2024 at 5:44 PM <rsbecker@nexbridge.com> wrote:
>
> On Wednesday, January 10, 2024 7:59 PM, Elijah Newren wrote:
[...]
> >Would you be okay with the following alternative: requiring that all Rust code be
> >optional for now?
> >
> >(In other words, allow you to build with USE_RUST=0, or something like that.  And
> >then we have both a Rust and a C implementation of anything that is required for
> >backward compatibility, while any new Rust-only stuff would not be included in
> >your build.)
>
> To address the immediate above, I assume this means that platform maintainers will be responsible for developing non-portable implementations that duplicate Rust functionality

This doesn't at all sound like what I thought I said.  The whole
proposal was so that folks like NonStop could continue using Git with
no more work than setting USE_RUST=0 at build time.

Why do you feel you'd need to duplicate any functionality?

^ permalink raw reply

* Re: [DISCUSS] Introducing Rust into the Git project
From: brian m. carlson @ 2024-01-11  2:55 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Taylor Blau', 'Junio C Hamano',
	'Dragan Simic', git
In-Reply-To: <007c01da4420$10a7b700$31f72500$@nexbridge.com>

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

On 2024-01-10 at 23:52:21, rsbecker@nexbridge.com wrote:
> Unfortunately, none of the compiler frontends listed previously can be
> built for NonStop. These appear to all require gcc either directly or
> transitively, which cannot be ported to NonStop. I do not expect this
> to change any time soon - and is outside of my control anyway. An
> attempt was made to port Rust but it did not succeed primarily because
> of that dependency.

Can you tell us what the technical limitations are that prevent GCC from
being ported so we can understand better?  I know LLVM doesn't support
ia64, which you do support, but GCC is very likely the most portable
compiler on the planet and supports architectures and OSes I've never
otherwise heard of.

I strongly suspect that if GCC did end up on NonStop, Rust would be able
to be ported, too, and you'd also get access to gccgo, which would make
Git LFS possible on NonStop as well[0].

I'm not capable of porting GCC, but I have done some portability work in
the Rust ecosystem, and I'd be willing to provide context and some
assistance (within my time and capabilities) to help get Rust working on
NonStop if you want.

[0] For the record, as a maintainer of Git LFS, I'm happy to accept
portability patches for virtually any OS.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* RE: [DISCUSS] Introducing Rust into the Git project
From: rsbecker @ 2024-01-11  2:57 UTC (permalink / raw)
  To: 'Elijah Newren'
  Cc: 'Taylor Blau', 'Junio C Hamano',
	'Dragan Simic', git
In-Reply-To: <CABPp-BHx=4HPSN4enkHTL7PPnNBsJ1vGWe4Em5imH7HcOcH2PA@mail.gmail.com>

On Wednesday, January 10, 2024 9:21 PM, Elijah Newren wrote:
>On Wed, Jan 10, 2024 at 5:44 PM <rsbecker@nexbridge.com> wrote:
>>
>> On Wednesday, January 10, 2024 7:59 PM, Elijah Newren wrote:
>[...]
>> >Would you be okay with the following alternative: requiring that all
>> >Rust code be optional for now?
>> >
>> >(In other words, allow you to build with USE_RUST=0, or something
>> >like that.  And then we have both a Rust and a C implementation of
>> >anything that is required for backward compatibility, while any new
>> >Rust-only stuff would not be included in your build.)
>>
>> To address the immediate above, I assume this means that platform
>> maintainers will be responsible for developing non-portable
>> implementations that duplicate Rust functionality
>
>This doesn't at all sound like what I thought I said.  The whole proposal was so that
>folks like NonStop could continue using Git with no more work than setting
>USE_RUST=0 at build time.
>
>Why do you feel you'd need to duplicate any functionality?

I think I misunderstood. What I took from this is that all new functionality would be in Rust, which would require a custom implementation in C for platforms that did not have Rust available - if that is even practical. Did I get that wrong?


^ permalink raw reply

* RE: [DISCUSS] Introducing Rust into the Git project
From: rsbecker @ 2024-01-11  3:24 UTC (permalink / raw)
  To: 'brian m. carlson'
  Cc: 'Taylor Blau', 'Junio C Hamano',
	'Dragan Simic', git
In-Reply-To: <ZZ9YrYvW_L9A02aI@tapette.crustytoothpaste.net>

On Wednesday, January 10, 2024 9:56 PM, brian m. carlson wrote:
>On 2024-01-10 at 23:52:21, rsbecker@nexbridge.com wrote:
>> Unfortunately, none of the compiler frontends listed previously can be
>> built for NonStop. These appear to all require gcc either directly or
>> transitively, which cannot be ported to NonStop. I do not expect this
>> to change any time soon - and is outside of my control anyway. An
>> attempt was made to port Rust but it did not succeed primarily because
>> of that dependency.
>
>Can you tell us what the technical limitations are that prevent GCC from being
>ported so we can understand better?  I know LLVM doesn't support ia64, which you
>do support, but GCC is very likely the most portable compiler on the planet and
>supports architectures and OSes I've never otherwise heard of.
>
>I strongly suspect that if GCC did end up on NonStop, Rust would be able to be
>ported, too, and you'd also get access to gccgo, which would make Git LFS possible
>on NonStop as well[0].
>
>I'm not capable of porting GCC, but I have done some portability work in the Rust
>ecosystem, and I'd be willing to provide context and some assistance (within my
>time and capabilities) to help get Rust working on NonStop if you want.
>
>[0] For the record, as a maintainer of Git LFS, I'm happy to accept portability
>patches for virtually any OS.

There are a number of issues for porting gcc (and Go). The list is fairly long, but the summary of what I encountered directly (on the last funded effort of 3) is:
1. There are C syntax constructs required to do anything useful (required for access to the OS API) on NonStop that are not in gcc. I can hand code the parser for that, but it would take time.
2. The Big Endian x86 architecture is weird to gcc and making that work is not easy.
3. There is no assembler on NonStop.
4. The ELF header is very different from standard.
5. The symbol table structure is radically different, so debugging would be (nearly) impossible or impractical. gdb was ported to account for the platform differences.
6. The linkage structure is similar but different from standard.
7. The external fixup structure is radically different.
8. The loader does not work the same way, so there are required sections of the ELF files on NonStop that are not generated by gcc.

There are more, but I just did not get to the point if hitting them. Part of my own issue is that I have expertise in parsing and semantic passes of compilers, but my code generation skills are not where I want them to be for taking on this effort. Our last funded attempt had a code generation expert and he gave up in frustration.

If I was hired on to do this, it might have a chance, but at an estimate (not mine) of 4-5 person years for a gcc port, best case, my $DAYJOB will not permit it.

If gcc could be ported to NonStop, it would solve so many problems. I have heard of numerous failed efforts beyond what was officially funded by various companies, so this is considered a high-risk project.


^ permalink raw reply

* Re: [PATCH 4/6] t1419: mark test suite as files-backend specific
From: Patrick Steinhardt @ 2024-01-11  5:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, git
In-Reply-To: <xmqqcyu9duha.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On Wed, Jan 10, 2024 at 08:27:29AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> On just this (non-technical) part...
> 
> >> > file, but not for loose references. Consequentially, all callers must
> >> > still filter emitted refs with those exclude patterns.
> >> 
> >> s/Consequentially/Consequently/
> >
> > Hum. I had the last time when you mentioned the in mind while writing
> > the commit message, but seemingly misremembered the outcome. So I now
> > looked things up in a dictionary, and both words seem to be used in
> > equivalent ways. As a non-native speaker, could you maybe elaborate a
> > bit to help me out? :)
> 
> As a non-native, I often find this
> 
>   https://trends.google.com/trends/explore?q=consequentially,consequently&hl=en
> 
> fairly useful.

Good idea, thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [DISCUSS] Introducing Rust into the Git project
From: Elijah Newren @ 2024-01-11  5:06 UTC (permalink / raw)
  To: rsbecker; +Cc: Taylor Blau, Junio C Hamano, Dragan Simic, git
In-Reply-To: <009c01da4439$f70beef0$e523ccd0$@nexbridge.com>

On Wed, Jan 10, 2024 at 6:57 PM <rsbecker@nexbridge.com> wrote:
>
> On Wednesday, January 10, 2024 9:21 PM, Elijah Newren wrote:
> >On Wed, Jan 10, 2024 at 5:44 PM <rsbecker@nexbridge.com> wrote:
> >>
> >> On Wednesday, January 10, 2024 7:59 PM, Elijah Newren wrote:
> >[...]
> >> >Would you be okay with the following alternative: requiring that all
> >> >Rust code be optional for now?
> >> >
> >> >(In other words, allow you to build with USE_RUST=0, or something
> >> >like that.  And then we have both a Rust and a C implementation of
> >> >anything that is required for backward compatibility, while any new
> >> >Rust-only stuff would not be included in your build.)
> >>
> >> To address the immediate above, I assume this means that platform
> >> maintainers will be responsible for developing non-portable
> >> implementations that duplicate Rust functionality
> >
> >This doesn't at all sound like what I thought I said.  The whole proposal was so that
> >folks like NonStop could continue using Git with no more work than setting
> >USE_RUST=0 at build time.
> >
> >Why do you feel you'd need to duplicate any functionality?
>
> I think I misunderstood. What I took from this is that all new functionality would be in Rust, which would require a custom implementation in C for platforms that did not have Rust available - if that is even practical. Did I get that wrong?

I think you somehow missed the word optional?

I did say that new functionality should be allowed to be Rust only
(unlike existing functionality), but I'm not sure how you leaped to
assuming that all new functionality would be in Rust.  Further, I also
don't understand why you jump to assuming that all new functionality
needs to be supported on all platforms.  The point of the word
"optional" in my proposal is that it is not required.  So, say, if
git-replay is in Rust, well you've never had git-replay before in any
release, so you haven't lost any functionality by it being implemented
in Rust.  And existing things (merge, cherry-pick, rebase, etc.)
continue working with C-only code.  But you may have one less optional
addition.

At least that was _my_ proposal -- that Rust be optional for now.  It
does differ from what I think Taylor was originally proposing, but
that's why I brought it up as an alternative proposal.

^ permalink raw reply

* Re: [DISCUSS] Introducing Rust into the Git project
From: Dragan Simic @ 2024-01-11  5:33 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Taylor Blau, git
In-Reply-To: <CABPp-BFOmwV-xBtjvtenb6RFz9wx2VWVpTeho0k=D8wsCCVwqQ@mail.gmail.com>

On 2024-01-11 01:12, Elijah Newren wrote:
> Another alternative (as discussed at Git Merge when we were last
> talking about Rust[8]), is requiring all Rust code to be optional for
> now.  If we choose to go that route, I think that means that (a) for
> existing components, we have both a Rust and a C implementation
> available, and (b) for new components (e.g. new top-level commands
> like git-replay), they can be Rust-only and those compiling without
> Rust just don't get them.

To me, this sounds like a horrible option, which is exactly what
I earlier referred to as introducing fragmentation.

^ permalink raw reply

* Re: [DISCUSS] Introducing Rust into the Git project
From: Dragan Simic @ 2024-01-11  5:39 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Taylor Blau, git
In-Reply-To: <CABPp-BH3sva=CNtx8YFGP4Egyau-hR+7njZPFEd-DRTw91BK2w@mail.gmail.com>

On 2024-01-11 01:33, Elijah Newren wrote:
> On Wed, Jan 10, 2024 at 1:57 PM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> 
>> Thus, Git should probably follow the same approach of not converting 
>> the
>> already existing code
> 
> I disagree with this.  I saw significant performance improvements
> through converting some existing Git code to Rust.  Granted, it was
> only a small amount of code, but the performance benefits I saw
> suggested we'd see more by also doing similar conversions elsewhere.
> (Note that I kept the old C code and then conditionally compiled
> either Rust or C versions of what I was converting.)

Well, it's also possible that improving the old C code could also result 
in some performance improvements.  Thus, quite frankly, I don't see that 
as a valid argument to rewrite some existing C code in Rust.

> Further, I found a really old bug from this effort as well[1], and I
> find it extremely unlikely that I would have found that bug otherwise.
> So, converting to Rust can even improve our existing C code.
> 
>> , but frankly, I don't see what would actually be
>> the "new leafs" written in Rust.
> 
> In addition to some of the examples Junio mentioned elsewhere, I think
> new toplevel commands, like git-replay, would qualify.
> 
> 
> [1] Yeah, I really need to dig the patch out and send it in.  I'll do
> so shortly.

^ permalink raw reply

* Re: [PATCH] index-pack: spawn threads atomically
From: Jeff King @ 2024-01-11  6:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git
In-Reply-To: <ZZ7VEVXSg1T8ZkIK@nand.local>

On Wed, Jan 10, 2024 at 12:34:09PM -0500, Taylor Blau wrote:

> > If do go with "--threads=1", I suspect several tests in that file need
> > it.
> 
> Yeah, there are a couple of others. I think the ones that need modifying
> are at the intersection of "expected to fail" and "in a test which is
> expected to pass leak-free":
> 
>     $ grep -l 'TEST_PASSES_SANITIZE_LEAK=true' t????-*.sh |
>       xargs grep -l 'test_must_fail git index-pack'
>     t5302-pack-index.sh
>     t5308-pack-detect-duplicates.sh
>     t5309-pack-delta-cycles.sh
>     t5313-pack-bounds-checks.sh
>     t5325-reverse-index.sh

I think that is more than we need. It's only a problem when we hit a
die() inside a thread, which happens only during delta resolution. So
your patch 2, for example, touches a test which triggers the
--max-input-size check. But we would find that out on the initial
unthreaded pass over the pack.

The one in patch 3 seems at first glance like it might be a problem
(it's another duplicate-object case, like the one of them in 5309). But
it isn't a problem because the duplicate object isn't a delta, so we
notice the problem in write_idx_file() from the main thread (which I
verified by running it under gdb and setting a breakpoint at die()).

I suspect patch 4 is the same, but didn't run gdb on each case. And
patch 5 is about a corrupt reverse index, so almost certainly the main
thread. So I suspect that patch 1 is the only one that matters here (and
probably all of those are needed, because it is all about broken
deltas).

All that said, I am on the fence between the two approaches. If Junio
prefers the atomic-spawn direction, I'm fine with that, and there's not
much point in polishing the --threads=1 approach further.

-Peff

^ permalink raw reply

* Re: [DISCUSS] Introducing Rust into the Git project
From: Patrick Steinhardt @ 2024-01-11  6:56 UTC (permalink / raw)
  To: Elijah Newren; +Cc: rsbecker, Taylor Blau, Junio C Hamano, Dragan Simic, git
In-Reply-To: <CABPp-BGmXw0NQ8yBaMiVXHiKr0-Y_jkZWmJB1CG_oc4UGxt_gA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4216 bytes --]

On Wed, Jan 10, 2024 at 09:06:23PM -0800, Elijah Newren wrote:
> On Wed, Jan 10, 2024 at 6:57 PM <rsbecker@nexbridge.com> wrote:
> >
> > On Wednesday, January 10, 2024 9:21 PM, Elijah Newren wrote:
> > >On Wed, Jan 10, 2024 at 5:44 PM <rsbecker@nexbridge.com> wrote:
> > >>
> > >> On Wednesday, January 10, 2024 7:59 PM, Elijah Newren wrote:
> > >[...]
> > >> >Would you be okay with the following alternative: requiring that all
> > >> >Rust code be optional for now?
> > >> >
> > >> >(In other words, allow you to build with USE_RUST=0, or something
> > >> >like that.  And then we have both a Rust and a C implementation of
> > >> >anything that is required for backward compatibility, while any new
> > >> >Rust-only stuff would not be included in your build.)
> > >>
> > >> To address the immediate above, I assume this means that platform
> > >> maintainers will be responsible for developing non-portable
> > >> implementations that duplicate Rust functionality
> > >
> > >This doesn't at all sound like what I thought I said.  The whole proposal was so that
> > >folks like NonStop could continue using Git with no more work than setting
> > >USE_RUST=0 at build time.
> > >
> > >Why do you feel you'd need to duplicate any functionality?
> >
> > I think I misunderstood. What I took from this is that all new functionality would be in Rust, which would require a custom implementation in C for platforms that did not have Rust available - if that is even practical. Did I get that wrong?
> 
> I think you somehow missed the word optional?
> 
> I did say that new functionality should be allowed to be Rust only
> (unlike existing functionality), but I'm not sure how you leaped to
> assuming that all new functionality would be in Rust.  Further, I also
> don't understand why you jump to assuming that all new functionality
> needs to be supported on all platforms.  The point of the word
> "optional" in my proposal is that it is not required.  So, say, if
> git-replay is in Rust, well you've never had git-replay before in any
> release, so you haven't lost any functionality by it being implemented
> in Rust.  And existing things (merge, cherry-pick, rebase, etc.)
> continue working with C-only code.  But you may have one less optional
> addition.
> 
> At least that was _my_ proposal -- that Rust be optional for now.  It
> does differ from what I think Taylor was originally proposing, but
> that's why I brought it up as an alternative proposal.

There are two ways to do this that I can see:

  - New features may not be available on some platforms. I think this is
    what Elijah had in mind.

  - New features may require two implementations, one in C and one in
    Rust. I think this is what Randall understood.

Ultimately, I think both alternatives would end up demoting platforms
that do not support Rust to become second-class citizens eventually.
This demotion is rather obvious in the case where new features may not
be available. But I also think that the second approach, where we
provide two implementations, would lead to a demotion of the Rust-less
platform because the alternate implementation in C would likely end up
receiving less attention than the Rust-based one. It's thus likely that
the implementation receiving less attention will deteriorate in code
quality.

I also think that once we start to accept Rust code, it will only be a
matter of time before we want to start using it in central code paths.
Rust does provide interfaces which are a lot nicer to use than the C
based ones, but it's hard to really reap the benefits unless we start to
embrace Rust fully. Also, the most complex interfaces tend to be those
which are deep inside our code base, like for example the object
database. They are thus also the most profitable targets for a Rust
conversion, even though likely also the hardest to realize.

To me this feels like a slippery slope, and the deeper we go the more
incentive we will have to drop platforms which do not support Rust
altogether. So I can certainly see where Randall is coming from and why
this proposal is not something that he is thrilled about.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Limited operations in unsafe repositories
From: Jeff King @ 2024-01-11  7:01 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git
In-Reply-To: <ZZ8pbAMNaBDFgf3G@tapette.crustytoothpaste.net>

On Wed, Jan 10, 2024 at 11:34:04PM +0000, brian m. carlson wrote:

> On 2024-01-10 at 12:05:31, Jeff King wrote:
> > My thinking is to flip that around: run all code, but put protection in
> > the spots that do unsafe things, like loading config or examining
> > hooks. I.e., a patch like this:
> 
> I think that's much what I had intended to do with not invoking binaries
> at all, except that it was limited to rev-parse.  I wonder if perhaps we
> could do something similar if we had the `--assume-unsafe` argument you
> proposed, except that we would only allow the `git` binary and always
> pass that argument to it in such a case.

I'm not sure what you mean by "invoking binaries". I had assumed that
meant just running other Git sub-processes. But if "--assume-unsafe" is
just setting an environment variable, they'd automatically be protected.

> I don't think reading config is intrinsically unsafe; it's more of what
> we do with it, which is spawning external processes, that's the problem.
> I suppose an argument could be made for injecting terminal sequences or
> such, though.  Hooks, obviously, are definitely unsafe.

Right, it's not config itself that's unsafe; it's that many options are.
We could try to annotate them to say "it is OK to parse core.eol but not
core.pager", presumably with an allow-known-good approach (since so many
ard bad!). But that feels like an ongoing maintenance headache, and an
easy way to make a mistake (your mention of terminal sequences makes me
assume you're thinking of "color.diff.*", etc). A rule like "we do not
read repo-level config at all" seems easier to explain (to me, anyway).

-Peff

^ permalink raw reply

* Re: [PATCH 1/2] t1401: generalize reference locking
From: Jeff King @ 2024-01-11  7:13 UTC (permalink / raw)
  To: Justin Tobler via GitGitGadget; +Cc: git, Justin Tobler
In-Reply-To: <cb78b549e5e826ffef39c55bd726164e6b7bb755.1704912750.git.gitgitgadget@gmail.com>

On Wed, Jan 10, 2024 at 06:52:29PM +0000, Justin Tobler via GitGitGadget wrote:

> From: Justin Tobler <jltobler@gmail.com>
> 
> Some tests set up reference locks by directly creating the lockfile.
> While this works for the files reference backend, reftable reference
> locks operate differently and are incompatible with this approach.
> Refactor the test to use git-update-ref(1) to lock refs instead so that
> the test does not need to be aware of how the ref backend locks refs.

It looks like you re-create this situation in a backend-agnostic way by
having two simultaneous updates that conflict on the lock (but don't
care how that lock is implemented).

That works, but I think we could keep it simple. This test doesn't care
about the exact error condition we create. The point was just to die in
create_symref() and make sure the exit code was propagated. So something
like this would work:

  $ git symbolic-ref refs/heads refs/heads/foo
  error: unable to write symref for refs/heads: Is a directory

(note that you get a different error message if the refs are packed,
since there we can notice the d/f conflict manually).

There may be other ways to stimulate a failure. I thought "symbolic-ref
HEAD refs/heads/.invalid" might work, but sadly the refname format check
happens earlier.

I think it is worth avoiding the fifo magic if we can. It's complicated,
and it means that not all platforms run the test.

-Peff

^ permalink raw reply

* Re: Limited operations in unsafe repositories
From: Patrick Steinhardt @ 2024-01-11  7:17 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git
In-Reply-To: <20240111070114.GB48154@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]

On Thu, Jan 11, 2024 at 02:01:14AM -0500, Jeff King wrote:
> On Wed, Jan 10, 2024 at 11:34:04PM +0000, brian m. carlson wrote:
> 
> > On 2024-01-10 at 12:05:31, Jeff King wrote:
> > > My thinking is to flip that around: run all code, but put protection in
> > > the spots that do unsafe things, like loading config or examining
> > > hooks. I.e., a patch like this:
> > 
> > I think that's much what I had intended to do with not invoking binaries
> > at all, except that it was limited to rev-parse.  I wonder if perhaps we
> > could do something similar if we had the `--assume-unsafe` argument you
> > proposed, except that we would only allow the `git` binary and always
> > pass that argument to it in such a case.
> 
> I'm not sure what you mean by "invoking binaries". I had assumed that
> meant just running other Git sub-processes. But if "--assume-unsafe" is
> just setting an environment variable, they'd automatically be protected.
> 
> > I don't think reading config is intrinsically unsafe; it's more of what
> > we do with it, which is spawning external processes, that's the problem.
> > I suppose an argument could be made for injecting terminal sequences or
> > such, though.  Hooks, obviously, are definitely unsafe.
> 
> Right, it's not config itself that's unsafe; it's that many options are.
> We could try to annotate them to say "it is OK to parse core.eol but not
> core.pager", presumably with an allow-known-good approach (since so many
> ard bad!). But that feels like an ongoing maintenance headache, and an
> easy way to make a mistake (your mention of terminal sequences makes me
> assume you're thinking of "color.diff.*", etc). A rule like "we do not
> read repo-level config at all" seems easier to explain (to me, anyway).

With the exemption of the repository format, I assume? We have to parse
things like `core.repositoryFormatVersion` and extensions in order to
figure out how a repository has to be accessed. So I agree that we
should not partition config based on safeness, which is going to be a
headache as you rightly point out. But we can partition based on whether
or not config is required in order to access the repository, where the
set of relevant config keys is a whole lot smaller.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] t5541: generalize reference locking
From: Jeff King @ 2024-01-11  7:28 UTC (permalink / raw)
  To: Justin Tobler via GitGitGadget; +Cc: git, Justin Tobler
In-Reply-To: <11fd5091d61b54d8862ab2e316bbd25fff63ce0f.1704912750.git.gitgitgadget@gmail.com>

On Wed, Jan 10, 2024 at 06:52:30PM +0000, Justin Tobler via GitGitGadget wrote:

> From: Justin Tobler <jltobler@gmail.com>
> 
> Some tests set up reference locks by directly creating the lockfile.
> While this works for the files reference backend, reftable reference
> locks operate differently and are incompatible with this approach.
> Generalize reference locking by preparing a reference transaction.

As with the first patch, I think we could use d/f conflicts to get the
same effect. Perhaps something like this:

diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index df758e187d..7eb0e887e1 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -233,7 +233,8 @@ test_expect_success 'push --atomic fails on server-side errors' '
 	up="$HTTPD_URL"/smart/atomic-branches.git &&
 
 	# break ref updates for other on the remote site
-	mkdir "$d/refs/heads/other.lock" &&
+	git -C "$d" update-ref -d refs/heads/other &&
+	git -C "$d" update-ref refs/heads/other/block-me HEAD &&
 
 	# add the new commit to other
 	git branch -f other collateral &&
@@ -244,12 +245,8 @@ test_expect_success 'push --atomic fails on server-side errors' '
 	# the new branch should not have been created upstream
 	test_must_fail git -C "$d" show-ref --verify refs/heads/atomic &&
 
-	# upstream should still reflect atomic2, the last thing we pushed
-	# successfully
-	git rev-parse atomic2 >expected &&
-	# ...to other.
-	git -C "$d" rev-parse refs/heads/other >actual &&
-	test_cmp expected actual &&
+	# upstream should not have updated, as it cannot be written at all.
+	test_must_fail git -C "$d" rev-parse --verify refs/heads/other &&
 
 	# the new branch should not have been created upstream
 	test_must_fail git -C "$d" show-ref --verify refs/heads/atomic &&

I do think that the original was slightly more interesting (since we
could check that "other" still existed but was not updated), but I think
the main point of the test is that "atomic" was not pushed at all.

-Peff

^ permalink raw reply related

* Re: Limited operations in unsafe repositories
From: Jeff King @ 2024-01-11  7:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: brian m. carlson, git
In-Reply-To: <ZZ-V-vwnm2hOkrMC@tanuki>

On Thu, Jan 11, 2024 at 08:17:14AM +0100, Patrick Steinhardt wrote:

> > Right, it's not config itself that's unsafe; it's that many options are.
> > We could try to annotate them to say "it is OK to parse core.eol but not
> > core.pager", presumably with an allow-known-good approach (since so many
> > ard bad!). But that feels like an ongoing maintenance headache, and an
> > easy way to make a mistake (your mention of terminal sequences makes me
> > assume you're thinking of "color.diff.*", etc). A rule like "we do not
> > read repo-level config at all" seems easier to explain (to me, anyway).
> 
> With the exemption of the repository format, I assume? We have to parse
> things like `core.repositoryFormatVersion` and extensions in order to
> figure out how a repository has to be accessed. So I agree that we
> should not partition config based on safeness, which is going to be a
> headache as you rightly point out. But we can partition based on whether
> or not config is required in order to access the repository, where the
> set of relevant config keys is a whole lot smaller.

Right. See the pseudo-patch I posted earlier. I think we just want to
touch do_git_config_sequence(), which leaves repo discovery and stuff
like "git config --local" working.

-Peff

^ permalink raw reply

* Re: [PATCH 1/4] reftable/stack: refactor stack reloading to have common exit path
From: Patrick Steinhardt @ 2024-01-11  7:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <xmqqmstdccz8.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 796 bytes --]

On Wed, Jan 10, 2024 at 09:30:51AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The `reftable_stack_reload_maybe_reuse()` function is responsible for
> > reloading the reftable list from disk. The function is quite hard to
> > follow though because it has a bunch of different exit paths, many of
> > which have to free the same set of resources.
> >
> > Refactor the function to have a common exit path. While at it, touch up
> > the style of this function a bit to match our usual coding style better.
> > ---
> >  reftable/stack.c | 86 +++++++++++++++++++++++-------------------------
> >  1 file changed, 42 insertions(+), 44 deletions(-)
> 
> Missing sign-off.

Ah, sorry, I forgot my usual `git rebase --signoff`. Will fix in v2.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 3/4] reftable/stack: use stat info to avoid re-reading stack list
From: Patrick Steinhardt @ 2024-01-11  7:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <xmqqo7dt9ckn.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]

On Wed, Jan 10, 2024 at 12:07:52PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We can do better and use the same stat(3P)-based mechanism that the
> > "packed" backend uses. Instead of reading the file, we will only open
> > the file descriptor, fstat(3P) it, and then compare the info against the
> > cached value from the last time we have updated the stack. This should
> > always work alright because "tables.list" is updated atomically via a
> > rename, so even if the ctime or mtime wasn't granular enough to identify
> > a change, at least the inode number should have changed.
> 
> Or the file size.  Let's keep in mind that many users get useless
> inum from their filesystem X-<.
> 
> >   Summary
> >     update-ref: create many refs (refcount = 1, revision = HEAD~) ran
> >       1.01 ± 0.09 times faster than update-ref: create many refs (refcount = 1, revision = HEAD)
> >       2.72 ± 0.11 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
> >       3.42 ± 0.13 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
> >     163.59 ± 5.62 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
> >     233.91 ± 7.92 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
> > ---
> 
> Nice.
> 
> > @@ -374,6 +375,8 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
> >  		sleep_millisec(delay);
> >  	}
> >  
> > +	stat_validity_update(&st->list_validity, fd);
> > +
> >  out:
> >  	if (fd >= 0)
> >  		close(fd);
> 
> The stat_validity_update() does not happen in the error codepath.
> 
> Should we be clearing the validity of the list when somebody jumps
> to "out:" due to an error?  Or by the time this function gets
> called, the caller would already have cleared the validity and an
> error that jumps to "out:" keeps the list invalid?

It likely does not matter much. This function only gets called when we
have previously determined the stack to be out of date already. So
unless we update the validity cache, we know that the next validity
check would continue to treat the stack as outdated.

That being said I think it's good hygiene to clear the validity cache
regardless of that.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] commit-graph: retain commit slab when closing NULL commit_graph
From: Jeff King @ 2024-01-11  7:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Scott Leggett
In-Reply-To: <xmqq34v5dtz9.fsf@gitster.g>

On Wed, Jan 10, 2024 at 08:38:18AM -0800, Junio C Hamano wrote:

> > It should be easy-ish to iterate through the slab and look at the
> > commits that are mentioned in it. Though maybe not? Each commit knows
> > its slab-id, but I'm not sure if we have a master list of commits to go
> > the other way.
> 
> We have table of all in-core objects, don't we?

Oh, duh. Yes, we could iterate over obj_hash. I do think the "on demand"
version I showed later in the message is better, though, as the work
both scales with the number of affected commits (rather than the total
number of objects) and can be done lazily (so callers that are not buggy
pay no price at all).

> > +	 * This will throw away the parents list, which is potentially sketchy.
> > +	 * A better version of this would just unset commit->object.parsed
> > +	 * and then do a minimal version of parse_commit() that _just_ loads
> > +	 * the tree data (and/or graph position if available).
> 
> Yeah, it is a concern that we may be working with an in-core commit
> object whose parent list has already been rewritten during revision
> traversal.  Well thought out.

Hmm. Looking at my "a better version" sentence, I guess we know that
either:

  1. The object really is corrupt (i.e., we could not load the tree).

  2. It came from a graph in the first place.

So what if we just tried harder to look it up in the graph file (rather
than the slab) when we see COMMIT_NOT_FROM_GRAPH? And indeed, we even
have a function to do this already!

So something like this almost works:

diff --git a/commit.c b/commit.c
index b3223478bc..096a3d18d0 100644
--- a/commit.c
+++ b/commit.c
@@ -418,10 +418,12 @@ static inline void set_commit_tree(struct commit *c, struct tree *t)
 struct tree *repo_get_commit_tree(struct repository *r,
 				  const struct commit *commit)
 {
+	uint32_t pos;
+
 	if (commit->maybe_tree || !commit->object.parsed)
 		return commit->maybe_tree;
 
-	if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
+	if (repo_find_commit_pos_in_graph(r, commit, &pos))
 		return get_commit_tree_in_graph(r, commit);
 
 	return NULL;

but:

  1. It doesn't update the slab, so get_commit_tree_in_graph() then
     complains immediately, because it uses only
     commit_graph_position(). I guess we'd need to do:

       commit_graph_data_at(commit)->graph_pos = pos;

     somewhere. That does make the recently-found segfault go away.
     But...

  2. I'm not sure if other spots would want similar treatment. We store
     generation numbers in the slab, too. I think we'll figure things
     out when they're not available, so there's no segfault problem. But
     it's maybe a potential performance issue. Likewise, it is probably
     a bug that the graph-writing code is looking at this commit at all
     (since we know it is already in a graph file). So we might be
     papering over that bug (that is, even though the segfault is gone,
     we are perhaps still writing a duplicate graph entry).

> > I dunno. I do feel like this is a lurking maintenance headache, and
> > might even be a triggerable bug. But without knowing of a way that it
> > happens in the current code base, it feels like it would be easy to make
> > things worse rather than better.
> 
> Unfortunately I share the feeling X-<.

I somehow sniped myself into thinking about it more, but that has only
reinforced my feeling that I'm afraid to touch it. ;)

-Peff

^ permalink raw reply related

* Re: [PATCH 3/3] advice: allow disabling the automatic hint in advise_if_enabled()
From: Jeff King @ 2024-01-11  8:04 UTC (permalink / raw)
  To: Dragan Simic; +Cc: Junio C Hamano, Rubén Justo, Git List
In-Reply-To: <d6d72ec5431ad1ca775e6e4e9921f94c@manjaro.org>

On Wed, Jan 10, 2024 at 06:45:49PM +0100, Dragan Simic wrote:

> 4) As a careful git user that remembers important things, you go back
> to your git configuration file and set core.verboseAdvice to true, and
> the additional advices are back, telling you how to disable the
> unwanted advice.
> 
> 5) After you disable the unwanted advice, you set core.verboseAdvice
> back to false and keep it that way until the next redundant advice
> pops up.
> 
> However, I do see that this approach has its downsides, for example
> the need for the unwanted advice to be displayed again together with
> the additional advice, by executing the appropriate git commands,
> after the above-described point #4.

Right, the need to re-trigger the advice is the problematic part here, I
think. In some cases it is easy. But in others, especially commands
which mutate the repo state (like the empty-commit rebase that started
this thread), you may need to do a lot of work to recreate the
situation.

> Let's see what it would look like with the granular, per-advice on/off
> knobs:
> 
> 1) You use git and find some advice useful, so you decide to keep it
> displayed.  However, the additional advice about turning the advice
> off becomes annoying a bit, or better said, it becomes redundant
> because the main advice stays.
> 
> 2) As a result, you follow the additional advice and set the specific
> knob to false, and voila, the redundant additional advice disappears.
> Of course, it once again isn't perfect, as the next point will clearly
> show.
> 
> 3) You keep using git, and one of the advices that you previously used
> to find useful becomes no longer needed.  But, what do you do, where's
> that helpful additional advice telling you how to turn the advice off?
> 
> 4) Now you need to dig though the manual pages, or to re-enable the
> additional advices in your git configuration file, perhaps all of them
> at once, while keeping a backup of your original settings, to restore
> it later.  Then, you again need to wait until the original advice gets
> displayed.

These steps (3) and (4) seem unlikely to me. These are by definition
messages you have seen before and decided to configure specifically (to
"always", and not just "off"). So you probably only have a handful (if
even more than one) of them in your config file.

Whereas for the case I am worried about, you are exposed to a _new_
message that you haven't seen before (and is maybe even new to Git!),
from the much-larger pool of "all advice messages Git knows about".

It's possible we could implement both mechanisms and let people choose
which one they want, depending on their preferences. It's not very much
code. I just hesitate to make things more complex than they need to be.

-Peff

^ permalink raw reply

* [PATCH] builtin/revert.c: refactor using an enum for cmd-action
From: Michael Lohmann @ 2024-01-11  8:04 UTC (permalink / raw)
  To: git; +Cc: Michael Lohmann, Wanja Henze

This is done to avoid having to keep the char values in sync in
different places and also to get compiler warnings on non-exhaustive
switches.

The newly introduced `revert_action`-enum aligns with the
builtin/rebase.c `action`-enum though the name `revert_action` is chosen
to better differentiate it from `replay_opts->action` with a different
function. For rebase the equivalent of the latter is
`rebase_options->type` and `rebase_options->action` corresponds to the
`cmd` variable in revert.c.

In the rebase `action` enum there is the enumeration constant
`ACTION_NONE` which is not particularly descriptive, since it seems to
imply that no action should be taken. Instead it signals a start of a
revert/cherry-pick process, so here `REVERT_ACTION_START` was chosen.

Co-authored-by: Wanja Henze <wanja.hentze@bevuta.com>
Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com>
---
Hello!

When I was working on `revert/cherry-pick --show-current-patch` (still
needs a little bit more discussion, if actually wanted, see thread [1])
I found the construct with the `cmd` as an int a bit irritating. I hope
this patch makes it more obvious what is actually going on.

Is there a reason why `ACTION_NONE` was chosen as a name in
builtin/rebase.c? My best guess is that it came along because it is the
implied action when no other specific action is passed in, but I don't
find that particularly descriptive on what its actual function is...
(Yes, naming things is hard... :D)

An alternative to prefixing the enum name with "revert_" would be to
rename `replay_opts->action` to `replay_opts->type` so it aligns with
rebase. Would you prefer that instead?

Cheers
Michael

[1] https://lore.kernel.org/git/20231218121048.68290-1-mi.al.lohmann@gmail.com/

 builtin/revert.c | 80 +++++++++++++++++++++++++++++-------------------
 1 file changed, 49 insertions(+), 31 deletions(-)

diff --git a/builtin/revert.c b/builtin/revert.c
index 89821bab95..b5278b7a3b 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -20,6 +20,14 @@
  * Copyright (c) 2005 Junio C Hamano
  */
 
+enum revert_action {
+	REVERT_ACTION_START = 0,
+	REVERT_ACTION_CONTINUE,
+	REVERT_ACTION_SKIP,
+	REVERT_ACTION_ABORT,
+	REVERT_ACTION_QUIT,
+};
+
 static const char * const revert_usage[] = {
 	N_("git revert [--[no-]edit] [-n] [-m <parent-number>] [-s] [-S[<keyid>]] <commit>..."),
 	N_("git revert (--continue | --skip | --abort | --quit)"),
@@ -85,12 +93,12 @@ static int run_sequencer(int argc, const char **argv, const char *prefix,
 	const char * const * usage_str = revert_or_cherry_pick_usage(opts);
 	const char *me = action_name(opts);
 	const char *cleanup_arg = NULL;
-	int cmd = 0;
+	enum revert_action cmd = REVERT_ACTION_START;
 	struct option base_options[] = {
-		OPT_CMDMODE(0, "quit", &cmd, N_("end revert or cherry-pick sequence"), 'q'),
-		OPT_CMDMODE(0, "continue", &cmd, N_("resume revert or cherry-pick sequence"), 'c'),
-		OPT_CMDMODE(0, "abort", &cmd, N_("cancel revert or cherry-pick sequence"), 'a'),
-		OPT_CMDMODE(0, "skip", &cmd, N_("skip current commit and continue"), 's'),
+		OPT_CMDMODE(0, "quit", &cmd, N_("end revert or cherry-pick sequence"), REVERT_ACTION_QUIT),
+		OPT_CMDMODE(0, "continue", &cmd, N_("resume revert or cherry-pick sequence"), REVERT_ACTION_CONTINUE),
+		OPT_CMDMODE(0, "abort", &cmd, N_("cancel revert or cherry-pick sequence"), REVERT_ACTION_ABORT),
+		OPT_CMDMODE(0, "skip", &cmd, N_("skip current commit and continue"), REVERT_ACTION_SKIP),
 		OPT_CLEANUP(&cleanup_arg),
 		OPT_BOOL('n', "no-commit", &opts->no_commit, N_("don't automatically commit")),
 		OPT_BOOL('e', "edit", &opts->edit, N_("edit the commit message")),
@@ -144,30 +152,37 @@ static int run_sequencer(int argc, const char **argv, const char *prefix,
 	}
 
 	/* Check for incompatible command line arguments */
-	if (cmd) {
-		char *this_operation;
-		if (cmd == 'q')
+	{
+		char *this_operation = 0;
+		switch (cmd) {
+		case REVERT_ACTION_START:
+			break;
+		case REVERT_ACTION_QUIT:
 			this_operation = "--quit";
-		else if (cmd == 'c')
+			break;
+		case REVERT_ACTION_CONTINUE:
 			this_operation = "--continue";
-		else if (cmd == 's')
+			break;
+		case REVERT_ACTION_SKIP:
 			this_operation = "--skip";
-		else {
-			assert(cmd == 'a');
+			break;
+		case REVERT_ACTION_ABORT:
 			this_operation = "--abort";
+			break;
 		}
 
-		verify_opt_compatible(me, this_operation,
-				"--no-commit", opts->no_commit,
-				"--signoff", opts->signoff,
-				"--mainline", opts->mainline,
-				"--strategy", opts->strategy ? 1 : 0,
-				"--strategy-option", opts->xopts.nr ? 1 : 0,
-				"-x", opts->record_origin,
-				"--ff", opts->allow_ff,
-				"--rerere-autoupdate", opts->allow_rerere_auto == RERERE_AUTOUPDATE,
-				"--no-rerere-autoupdate", opts->allow_rerere_auto == RERERE_NOAUTOUPDATE,
-				NULL);
+		if (this_operation)
+			verify_opt_compatible(me, this_operation,
+					"--no-commit", opts->no_commit,
+					"--signoff", opts->signoff,
+					"--mainline", opts->mainline,
+					"--strategy", opts->strategy ? 1 : 0,
+					"--strategy-option", opts->xopts.nr ? 1 : 0,
+					"-x", opts->record_origin,
+					"--ff", opts->allow_ff,
+					"--rerere-autoupdate", opts->allow_rerere_auto == RERERE_AUTOUPDATE,
+					"--no-rerere-autoupdate", opts->allow_rerere_auto == RERERE_NOAUTOUPDATE,
+					NULL);
 	}
 
 	if (!opts->strategy && opts->default_strategy) {
@@ -183,9 +198,7 @@ static int run_sequencer(int argc, const char **argv, const char *prefix,
 				"--edit", opts->edit > 0,
 				NULL);
 
-	if (cmd) {
-		opts->revs = NULL;
-	} else {
+	if (cmd == REVERT_ACTION_START) {
 		struct setup_revision_opt s_r_opt;
 		opts->revs = xmalloc(sizeof(*opts->revs));
 		repo_init_revisions(the_repository, opts->revs, NULL);
@@ -198,6 +211,8 @@ static int run_sequencer(int argc, const char **argv, const char *prefix,
 		memset(&s_r_opt, 0, sizeof(s_r_opt));
 		s_r_opt.assume_dashdash = 1;
 		argc = setup_revisions(argc, argv, opts->revs, &s_r_opt);
+	} else {
+		opts->revs = NULL;
 	}
 
 	if (argc > 1)
@@ -210,19 +225,22 @@ static int run_sequencer(int argc, const char **argv, const char *prefix,
 		opts->strategy = xstrdup(getenv("GIT_TEST_MERGE_ALGORITHM"));
 	free(options);
 
-	if (cmd == 'q') {
+	switch (cmd) {
+	case REVERT_ACTION_QUIT: {
 		int ret = sequencer_remove_state(opts);
 		if (!ret)
 			remove_branch_state(the_repository, 0);
 		return ret;
 	}
-	if (cmd == 'c')
+	case REVERT_ACTION_CONTINUE:
 		return sequencer_continue(the_repository, opts);
-	if (cmd == 'a')
+	case REVERT_ACTION_ABORT:
 		return sequencer_rollback(the_repository, opts);
-	if (cmd == 's')
+	case REVERT_ACTION_SKIP:
 		return sequencer_skip(the_repository, opts);
-	return sequencer_pick_revisions(the_repository, opts);
+	case REVERT_ACTION_START:
+		return sequencer_pick_revisions(the_repository, opts);
+	}
 }
 
 int cmd_revert(int argc, const char **argv, const char *prefix)
-- 
2.42.0


^ permalink raw reply related

* gitweb: Exiting subroutine via next at /usr/lib/cgi-bin/gitweb.cgi line 3142
From: martin f krafft @ 2024-01-11  8:28 UTC (permalink / raw)
  To: git

Hi team,

I get tens of thousands of messages like

```
[Thu Jan 11 09:01:11 2024] gitweb.cgi: Exiting subroutine via next at /usr/lib/cgi-bin/gitweb.cgi line 3142.
```

in my logs, presumably from crawlers hitting the CGI wrongly. This 
is using gitweb 2.39.2.

I know no Perl whatsoever, but it's my impression that the following 
patch would solve this:

```
--- /tmp/gitweb.cgi	2024-01-11 09:07:47.283764508 +0100
+++ /usr/lib/cgi-bin/gitweb.cgi	2024-01-11 09:01:43.459218235 +0100
@@ -3139,7 +3139,7 @@
  				my $path = substr($File::Find::name, $pfxlen + 1);
  				# paranoidly only filter here
  				if ($paranoid && $filter && $path !~ m!^\Q$filter\E/!) {
-					next;
+					return;
  				}
  				# we check related file in $projectroot
  				if (check_export_ok("$projectroot/$path")) {
```

This is within the wanted sub for `File::Find::find`, and it returns 
next so as to skip the currently visited file during the tree walk. 
Returning a false value, which `return` without an argument 
presumably does, should have the same effect.

Cheers,

-- 
martin krafft | https://matrix.to/#/#madduck:madduck.net
 
"it usually takes more than three weeks
 to prepare a good impromptu speech.
                                                       -- mark twain
{: .blockquote }
 
spamtraps: madduck.bogus@madduck.net
{: .hidden }


^ permalink raw reply

* Re: [PATCH 4/4] reftable/blocksource: use mmap to read tables
From: Patrick Steinhardt @ 2024-01-11  9:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <xmqq5y00anlj.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 5948 bytes --]

On Wed, Jan 10, 2024 at 01:24:24PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Using mmap comes with a significant drawback on Windows though, because
> > mmapped files cannot be deleted and neither is it possible to rename
> > files onto an mmapped file. But for one, the reftable library gracefully
> > handles the case where auto-compaction cannot delete a still-open stack
> > already and ignores any such errors. Also, `reftable_stack_clean()` will
> > prune stale tables which are not referenced by "tables.list" anymore so
> > that those files can eventually be pruned. And second, we never rewrite
> > already-rewritten stacks, so it does not matter that we cannot rename a
> > file over an mmaped file, either.
> 
> I somehow thought that we use "read into an allocated memory and
> pretend as if we mapped" emulation on Windows, so all of that may be
> moot.

Ah, you're right in fact. I missed the definition of `git_mmap()` and
`git_munmap()`.

> > diff --git a/reftable/blocksource.c b/reftable/blocksource.c
> > index a1ea304429..5d3f3d264c 100644
> > --- a/reftable/blocksource.c
> > +++ b/reftable/blocksource.c
> > @@ -13,6 +13,12 @@ license that can be found in the LICENSE file or at
> >  #include "reftable-blocksource.h"
> >  #include "reftable-error.h"
> >  
> > +#if defined(NO_MMAP)
> > +static int use_mmap = 0;
> > +#else
> > +static int use_mmap = 1;
> > +#endif
> 
> Is there (do you need) some externally controllable knob that the
> user can use to turn this off on a system that is compiled without
> NO_MMAP?  Otherwise it is misleading to have this as a variable.

No. The benefit of using a variable instead of using ifdefs is that we
know that both code paths continue to compile just fine. We do the same
thing in "refs/packed-backend.c".

But this code is not needed at all anyway, as you pointed out, because
we can simply use the emulated mmap(3P) code.

> > +static void file_close(void *v)
> >  {
> > -	int fd = ((struct file_block_source *)b)->fd;
> > -	if (fd > 0) {
> > -		close(fd);
> > -		((struct file_block_source *)b)->fd = 0;
> > +	struct file_block_source *b = v;
> > +
> > +	if (b->fd >= 0) {
> > +		close(b->fd);
> > +		b->fd = -1;
> >  	}
> >  
> > +	if (use_mmap)
> > +		munmap(b->data, b->size);
> > +	else
> > +		reftable_free(b->data);
> > +	b->data = NULL;
> > +
> >  	reftable_free(b);
> >  }
> 
> It would have been nicer to do this kind of "a void pointer is taken
> and we immediately cast it to a concrete and useful type upon entry"
> clean-up as a separate step that is purely clean-up, if there were
> many instances.  It is the first such one in the series as far as I
> remember, so it is not a huge deal.
> 
> > @@ -108,9 +119,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
> >  {
> >  	struct file_block_source *b = v;
> >  	assert(off + size <= b->size);
> > -	dest->data = reftable_malloc(size);
> > -	if (pread_in_full(b->fd, dest->data, size, off) != size)
> > -		return -1;
> > +	dest->data = b->data + off;
> >  	dest->len = size;
> >  	return size;
> >  }
> 
> So, we used to delay the allocation and reading of a block until
> this function gets called.  Now, by the time the control reaches
> the function, we are expected to have the data handy at b->data.
> That is ensured by reftable_block_source_from_file(), I presume?
> 
> > @@ -127,8 +136,10 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
> >  {
> >  	struct stat st = { 0 };
> >  	int err = 0;
> > -	int fd = open(name, O_RDONLY);
> > +	int fd;
> >  	struct file_block_source *p = NULL;
> > +
> > +	fd = open(name, O_RDONLY);
> >  	if (fd < 0) {
> >  		if (errno == ENOENT) {
> >  			return REFTABLE_NOT_EXIST_ERROR;
> 
> This is a no-op clean-up that would have been better in a separate
> clean-up step.  Again, not a huge deal.

Yeah, fair enough. One problem I have with the reftable library is that
its coding style doesn't quite blend in with the rest of the Git code
base, so I want to refactor code that I'm touching to match our coding
style better. This means that there will be a lot of such smallish
refactorings, and I'm often not sure whether to evict them into their
own commit or whether to do the refactoring in the same step.

For small changes like this one here I think it's reasonable to do so in
the same commit. But larger refactorings like e.g. the introduction of
the common exit path in the first patch of this series I of course put
into their own commit.

> > @@ -144,7 +155,18 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
> >  
> >  	p = reftable_calloc(sizeof(struct file_block_source));
> >  	p->size = st.st_size;
> > -	p->fd = fd;
> > +	if (use_mmap) {
> > +		p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> > +		p->fd = fd;
> 
> This is a bit unusual for our use of mmap() where the norm is to
> close the file descriptor once we mapped (we only need the memory
> region itself and not the originating file descriptor to unmap it).
> 
> Is there a reason why we need to keep p->fd?  Now the other side of
> this if/else preallocates the whole thing and slurps everything in
> core to allow the remainder of the code to mimic what happens on the
> mmaped block memory (e.g., we saw how file_read_block() assumes that
> b->data[0..b->size] are valid) and does not obviously need p->fd,
> can we just remove .fd member from the file_block_source structure?
> 
> That way, file_close() can also lose the conditional close() call.
> 
> For that matter, do we need any codepath in this file that is
> enabled by !use_mmap?  Can't we just use xmmap() and let its
> "instead, we allocate, read into it and return" emulation?

Not really, I'll update the code accordingly.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 0/5] reftable: optimize I/O patterns
From: Patrick Steinhardt @ 2024-01-11 10:06 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704714575.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 15504 bytes --]

Hi,

this is the second version of my patch series that optimizes some I/O
patterns that the reftable library uses. Changes compared to v1:

  - Added missing signoffs.

  - The validity cache is now cleared when reloading the reftable stack
    fails.

  - Style fixes for `reftable_block_source_from_file()` are now in a
    separate commit.

  - We now use `xmmap()` exclusively, relying on its fallback code on
    NO_MMAP platforms.

  - The file descriptor is closed immediately after mmapping now.

Thanks for your review, Junio!

Patrick

Patrick Steinhardt (5):
  reftable/stack: refactor stack reloading to have common exit path
  reftable/stack: refactor reloading to use file descriptor
  reftable/stack: use stat info to avoid re-reading stack list
  reftable/blocksource: refactor code to match our coding style
  reftable/blocksource: use mmap to read tables

 reftable/blocksource.c |  39 ++++++--------
 reftable/stack.c       | 113 +++++++++++++++++++++++++----------------
 reftable/stack.h       |   1 +
 reftable/system.h      |   1 +
 4 files changed, 85 insertions(+), 69 deletions(-)

Range-diff against v1:
1:  01ece2626d ! 1:  4b7f52c415 reftable/stack: refactor stack reloading to have common exit path
    @@ Commit message
         Refactor the function to have a common exit path. While at it, touch up
         the style of this function a bit to match our usual coding style better.
     
    +    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    +
      ## reftable/stack.c ##
     @@ reftable/stack.c: static int tv_cmp(struct timeval *a, struct timeval *b)
      static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
2:  726d302d7b ! 2:  36b9f6b624 reftable/stack: refactor reloading to use file descriptor
    @@ Commit message
         Prepare for this by converting the code to use `fd_read_lines()` so that
         we have the file descriptor readily available.
     
    +    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    +
      ## reftable/stack.c ##
     @@ reftable/stack.c: static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
      	struct timeval deadline;
3:  4fabdc3d80 ! 3:  c0f7cec95b reftable/stack: use stat info to avoid re-reading stack list
    @@ Commit message
         cached value from the last time we have updated the stack. This should
         always work alright because "tables.list" is updated atomically via a
         rename, so even if the ctime or mtime wasn't granular enough to identify
    -    a change, at least the inode number should have changed.
    +    a change, at least the inode number or file size should have changed.
     
         This change significantly speeds up operations where many refs are read,
         like when using git-update-ref(1). The following benchmark creates N
         refs in an otherwise-empty repository via `git update-ref --stdin`:
     
           Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
    -        Time (mean ± σ):       5.6 ms ±   0.2 ms    [User: 2.5 ms, System: 3.0 ms]
    -        Range (min … max):     5.2 ms …   6.0 ms    89 runs
    +        Time (mean ± σ):       5.1 ms ±   0.2 ms    [User: 2.4 ms, System: 2.6 ms]
    +        Range (min … max):     4.8 ms …   7.2 ms    109 runs
     
           Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
    -        Time (mean ± σ):      19.2 ms ±   0.3 ms    [User: 8.6 ms, System: 10.4 ms]
    -        Range (min … max):    18.4 ms …  19.8 ms    63 runs
    +        Time (mean ± σ):      19.1 ms ±   0.9 ms    [User: 8.9 ms, System: 9.9 ms]
    +        Range (min … max):    18.4 ms …  26.7 ms    72 runs
     
           Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
    -        Time (mean ± σ):      1.310 s ±  0.014 s    [User: 0.566 s, System: 0.724 s]
    -        Range (min … max):    1.291 s …  1.342 s    10 runs
    +        Time (mean ± σ):      1.336 s ±  0.018 s    [User: 0.590 s, System: 0.724 s]
    +        Range (min … max):    1.314 s …  1.373 s    10 runs
     
           Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
    -        Time (mean ± σ):       5.7 ms ±   0.4 ms    [User: 2.6 ms, System: 3.1 ms]
    -        Range (min … max):     5.1 ms …   9.5 ms    91 runs
    +        Time (mean ± σ):       5.1 ms ±   0.2 ms    [User: 2.4 ms, System: 2.6 ms]
    +        Range (min … max):     4.8 ms …   7.2 ms    109 runs
     
           Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
    -        Time (mean ± σ):      15.2 ms ±   0.3 ms    [User: 7.0 ms, System: 8.1 ms]
    -        Range (min … max):    14.3 ms …  17.1 ms    71 runs
    +        Time (mean ± σ):      14.8 ms ±   0.2 ms    [User: 7.1 ms, System: 7.5 ms]
    +        Range (min … max):    14.2 ms …  15.2 ms    82 runs
     
           Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
    -        Time (mean ± σ):     916.1 ms ±  11.0 ms    [User: 420.8 ms, System: 495.0 ms]
    -        Range (min … max):   906.9 ms … 943.8 ms    10 runs
    +        Time (mean ± σ):     927.6 ms ±   5.3 ms    [User: 437.8 ms, System: 489.5 ms]
    +        Range (min … max):   919.4 ms … 936.4 ms    10 runs
     
           Summary
    -        update-ref: create many refs (refcount = 1, revision = HEAD~) ran
    -          1.01 ± 0.09 times faster than update-ref: create many refs (refcount = 1, revision = HEAD)
    -          2.72 ± 0.11 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
    -          3.42 ± 0.13 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
    -        163.59 ± 5.62 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    -        233.91 ± 7.92 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
    +        update-ref: create many refs (refcount = 1, revision = HEAD) ran
    +          1.00 ± 0.07 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
    +          2.89 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
    +          3.74 ± 0.25 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
    +        181.26 ± 8.30 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    +        261.01 ± 12.35 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
    +
    +    Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## reftable/stack.c ##
     @@ reftable/stack.c: void reftable_stack_destroy(struct reftable_stack *st)
    @@ reftable/stack.c: static int reftable_stack_reload_maybe_reuse(struct reftable_s
     +	stat_validity_update(&st->list_validity, fd);
     +
      out:
    ++	if (err)
    ++		stat_validity_clear(&st->list_validity);
      	if (fd >= 0)
      		close(fd);
    + 	free_names(names);
     @@ reftable/stack.c: static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
      static int stack_uptodate(struct reftable_stack *st)
      {
-:  ---------- > 4:  96e79dc3ba reftable/blocksource: refactor code to match our coding style
4:  a23f38a806 ! 5:  e53a20c8e1 reftable/blocksource: use mmap to read tables
    @@ Commit message
         already and ignores any such errors. Also, `reftable_stack_clean()` will
         prune stale tables which are not referenced by "tables.list" anymore so
         that those files can eventually be pruned. And second, we never rewrite
    -    already-rewritten stacks, so it does not matter that we cannot rename a
    +    already-written stacks, so it does not matter that we cannot rename a
         file over an mmaped file, either.
     
         Another unfortunate property of mmap is that it is not supported by all
         systems. But given that the size of reftables should typically be rather
         limited (megabytes at most in the vast majority of repositories), we can
    -    provide an easy fallback by just reading the complete table into memory
    -    on such platforms. This is the same strategy that the "packed" backend
    -    uses in case the platform does not provide mmap.
    +    use the fallback implementation provided by `git_mmap()` which reads the
    +    whole file into memory instead. This is the same strategy that the
    +    "packed" backend uses.
     
         While this change doesn't significantly improve performance in the case
         where we're seeking through stacks once (like e.g. git-for-each-ref(1)
    @@ Commit message
         empty repository:
     
           Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
    -        Time (mean ± σ):       5.6 ms ±   0.2 ms    [User: 2.7 ms, System: 2.8 ms]
    -        Range (min … max):     5.1 ms …   6.0 ms    90 runs
    +        Time (mean ± σ):       5.1 ms ±   0.2 ms    [User: 2.5 ms, System: 2.5 ms]
    +        Range (min … max):     4.8 ms …   7.1 ms    111 runs
     
           Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
    -        Time (mean ± σ):      15.1 ms ±   0.4 ms    [User: 7.1 ms, System: 8.0 ms]
    -        Range (min … max):    14.2 ms …  16.5 ms    71 runs
    +        Time (mean ± σ):      14.8 ms ±   0.5 ms    [User: 7.1 ms, System: 7.5 ms]
    +        Range (min … max):    14.1 ms …  18.7 ms    84 runs
     
           Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
    -        Time (mean ± σ):     919.4 ms ±  11.2 ms    [User: 427.5 ms, System: 491.7 ms]
    -        Range (min … max):   908.1 ms … 936.6 ms    10 runs
    +        Time (mean ± σ):     926.4 ms ±   5.6 ms    [User: 448.5 ms, System: 477.7 ms]
    +        Range (min … max):   920.0 ms … 936.1 ms    10 runs
     
           Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
    -        Time (mean ± σ):       5.5 ms ±   0.3 ms    [User: 2.4 ms, System: 3.1 ms]
    -        Range (min … max):     5.0 ms …   7.3 ms    89 runs
    +        Time (mean ± σ):       5.0 ms ±   0.2 ms    [User: 2.4 ms, System: 2.5 ms]
    +        Range (min … max):     4.7 ms …   5.4 ms    111 runs
     
           Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
    -        Time (mean ± σ):      10.4 ms ±   0.3 ms    [User: 5.1 ms, System: 5.3 ms]
    -        Range (min … max):     9.7 ms …  11.0 ms    78 runs
    +        Time (mean ± σ):      10.5 ms ±   0.2 ms    [User: 5.0 ms, System: 5.3 ms]
    +        Range (min … max):    10.0 ms …  10.9 ms    93 runs
     
           Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
    -        Time (mean ± σ):     483.5 ms ±   6.3 ms    [User: 227.8 ms, System: 255.6 ms]
    -        Range (min … max):   477.5 ms … 498.8 ms    10 runs
    +        Time (mean ± σ):     529.6 ms ±   9.1 ms    [User: 268.0 ms, System: 261.4 ms]
    +        Range (min … max):   522.4 ms … 547.1 ms    10 runs
     
           Summary
             update-ref: create many refs (refcount = 1, revision = HEAD) ran
               1.01 ± 0.06 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
    -          1.89 ± 0.11 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
    -          2.73 ± 0.16 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
    -         87.55 ± 4.65 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    -        166.48 ± 8.80 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
    +          2.08 ± 0.07 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
    +          2.95 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
    +        105.33 ± 3.76 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    +        184.24 ± 5.89 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
     
         Theoretically, we could also replicate the strategy of the "packed"
         backend where small tables are read into memory instead of using mmap.
         Benchmarks did not confirm that this has a performance benefit though.
     
    +    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    +
      ## reftable/blocksource.c ##
    -@@ reftable/blocksource.c: license that can be found in the LICENSE file or at
    - #include "reftable-blocksource.h"
    - #include "reftable-error.h"
    - 
    -+#if defined(NO_MMAP)
    -+static int use_mmap = 0;
    -+#else
    -+static int use_mmap = 1;
    -+#endif
    -+
    - static void strbuf_return_block(void *b, struct reftable_block *dest)
    - {
    - 	if (dest->len)
     @@ reftable/blocksource.c: struct reftable_block_source malloc_block_source(void)
    + }
    + 
      struct file_block_source {
    - 	int fd;
    +-	int fd;
      	uint64_t size;
     +	unsigned char *data;
      };
    @@ reftable/blocksource.c: static uint64_t file_size(void *b)
     -	if (fd > 0) {
     -		close(fd);
     -		((struct file_block_source *)b)->fd = 0;
    +-	}
    +-
     +	struct file_block_source *b = v;
    -+
    -+	if (b->fd >= 0) {
    -+		close(b->fd);
    -+		b->fd = -1;
    - 	}
    - 
    -+	if (use_mmap)
    -+		munmap(b->data, b->size);
    -+	else
    -+		reftable_free(b->data);
    -+	b->data = NULL;
    -+
    ++	munmap(b->data, b->size);
      	reftable_free(b);
      }
      
    @@ reftable/blocksource.c: static int file_read_block(void *v, struct reftable_bloc
      	return size;
      }
     @@ reftable/blocksource.c: int reftable_block_source_from_file(struct reftable_block_source *bs,
    - {
    - 	struct stat st = { 0 };
    - 	int err = 0;
    --	int fd = open(name, O_RDONLY);
    -+	int fd;
    - 	struct file_block_source *p = NULL;
    -+
    -+	fd = open(name, O_RDONLY);
    - 	if (fd < 0) {
    - 		if (errno == ENOENT) {
    - 			return REFTABLE_NOT_EXIST_ERROR;
    -@@ reftable/blocksource.c: int reftable_block_source_from_file(struct reftable_block_source *bs,
      
    - 	p = reftable_calloc(sizeof(struct file_block_source));
    + 	p = reftable_calloc(sizeof(*p));
      	p->size = st.st_size;
     -	p->fd = fd;
    -+	if (use_mmap) {
    -+		p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    -+		p->fd = fd;
    -+	} else {
    -+		p->data = xmalloc(st.st_size);
    -+		if (read_in_full(fd, p->data, st.st_size) != st.st_size) {
    -+			close(fd);
    -+			return -1;
    -+		}
    -+		close(fd);
    -+		p->fd = -1;
    -+	}
    ++	p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    ++	close(fd);
      
      	assert(!bs->ops);
      	bs->ops = &file_vtable;

base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 1/5] reftable/stack: refactor stack reloading to have common exit path
From: Patrick Steinhardt @ 2024-01-11 10:06 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704966670.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3371 bytes --]

The `reftable_stack_reload_maybe_reuse()` function is responsible for
reloading the reftable list from disk. The function is quite hard to
follow though because it has a bunch of different exit paths, many of
which have to free the same set of resources.

Refactor the function to have a common exit path. While at it, touch up
the style of this function a bit to match our usual coding style better.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 86 +++++++++++++++++++++++-------------------------
 1 file changed, 42 insertions(+), 44 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 16bab82063..bf869a6772 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -304,69 +304,67 @@ static int tv_cmp(struct timeval *a, struct timeval *b)
 static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 					     int reuse_open)
 {
-	struct timeval deadline = { 0 };
-	int err = gettimeofday(&deadline, NULL);
+	char **names = NULL, **names_after = NULL;
+	struct timeval deadline;
 	int64_t delay = 0;
-	int tries = 0;
-	if (err < 0)
-		return err;
+	int tries = 0, err;
 
+	err = gettimeofday(&deadline, NULL);
+	if (err < 0)
+		goto out;
 	deadline.tv_sec += 3;
+
 	while (1) {
-		char **names = NULL;
-		char **names_after = NULL;
-		struct timeval now = { 0 };
-		int err = gettimeofday(&now, NULL);
-		int err2 = 0;
-		if (err < 0) {
-			return err;
-		}
+		struct timeval now;
 
-		/* Only look at deadlines after the first few times. This
-		   simplifies debugging in GDB */
+		err = gettimeofday(&now, NULL);
+		if (err < 0)
+			goto out;
+
+		/*
+		 * Only look at deadlines after the first few times. This
+		 * simplifies debugging in GDB.
+		 */
 		tries++;
-		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
-			break;
-		}
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0)
+			goto out;
 
 		err = read_lines(st->list_file, &names);
-		if (err < 0) {
-			free_names(names);
-			return err;
-		}
+		if (err < 0)
+			goto out;
+
 		err = reftable_stack_reload_once(st, names, reuse_open);
-		if (err == 0) {
-			free_names(names);
+		if (!err)
 			break;
-		}
-		if (err != REFTABLE_NOT_EXIST_ERROR) {
-			free_names(names);
-			return err;
-		}
-
-		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
-		   writer. Check if there was one by checking if the name list
-		   changed.
-		*/
-		err2 = read_lines(st->list_file, &names_after);
-		if (err2 < 0) {
-			free_names(names);
-			return err2;
-		}
-
+		if (err != REFTABLE_NOT_EXIST_ERROR)
+			goto out;
+
+		/*
+		 * REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		 * writer. Check if there was one by checking if the name list
+		 * changed.
+		 */
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0)
+			goto out;
 		if (names_equal(names_after, names)) {
-			free_names(names);
-			free_names(names_after);
-			return err;
+			err = REFTABLE_NOT_EXIST_ERROR;
+			goto out;
 		}
+
 		free_names(names);
+		names = NULL;
 		free_names(names_after);
+		names_after = NULL;
 
 		delay = delay + (delay * rand()) / RAND_MAX + 1;
 		sleep_millisec(delay);
 	}
 
-	return 0;
+out:
+	free_names(names);
+	free_names(names_after);
+	return err;
 }
 
 /* -1 = error
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 2/5] reftable/stack: refactor reloading to use file descriptor
From: Patrick Steinhardt @ 2024-01-11 10:06 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704966670.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1900 bytes --]

We're about to introduce a stat(3P)-based caching mechanism to reload
the list of stacks only when it has changed. In order to avoid race
conditions this requires us to have a file descriptor available that we
can use to call fstat(3P) on.

Prepare for this by converting the code to use `fd_read_lines()` so that
we have the file descriptor readily available.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index bf869a6772..b1ee247601 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -308,6 +308,7 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 	struct timeval deadline;
 	int64_t delay = 0;
 	int tries = 0, err;
+	int fd = -1;
 
 	err = gettimeofday(&deadline, NULL);
 	if (err < 0)
@@ -329,9 +330,19 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 		if (tries > 3 && tv_cmp(&now, &deadline) >= 0)
 			goto out;
 
-		err = read_lines(st->list_file, &names);
-		if (err < 0)
-			goto out;
+		fd = open(st->list_file, O_RDONLY);
+		if (fd < 0) {
+			if (errno != ENOENT) {
+				err = REFTABLE_IO_ERROR;
+				goto out;
+			}
+
+			names = reftable_calloc(sizeof(char *));
+		} else {
+			err = fd_read_lines(fd, &names);
+			if (err < 0)
+				goto out;
+		}
 
 		err = reftable_stack_reload_once(st, names, reuse_open);
 		if (!err)
@@ -356,12 +367,16 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 		names = NULL;
 		free_names(names_after);
 		names_after = NULL;
+		close(fd);
+		fd = -1;
 
 		delay = delay + (delay * rand()) / RAND_MAX + 1;
 		sleep_millisec(delay);
 	}
 
 out:
+	if (fd >= 0)
+		close(fd);
 	free_names(names);
 	free_names(names_after);
 	return err;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox