Git development

Git development
 help / color / mirror / Atom feed

* Re: git bisect stuck - --force flag required for checkout
From: brian m. carlson @ 2023-12-21 12:22 UTC (permalink / raw)
  To: Devste Devste; +Cc: git
In-Reply-To: <CANM0SV3SEF28QJ2V0Q9ydp8yDbL8TDc1m871oxOB=UtwF1TtxQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]

On 2023-12-21 at 10:47:57, Devste Devste wrote:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
> 
> What did you do before the bug happened? (Steps to reproduce your issue)
> add file Foo.txt to .git and commit
> add some commits with any changes to other files, as this is needed
> for reproduction
> run: git config core.ignorecase false

`core.ignorecase` is specifically designed for this case.  It's set
internally by Git when the repository is created, and it's not supposed
to be changed by the user.

If you want a repository where there's no case sensitivity, then I'd
recommend WSL.  It's also possible to make some directories case
sensitive in Windows 10 and newer and allegedly that works recursively,
so you could use `fsutil` to do that, then run `git init`, then add
data.

> rename Foo.txt to foo.txt and commit
> add some commits with any changes to other files, as this is needed
> for reproduction
> run: git bisect start && git bisect bad
> eventually, when running "git bisect good" (or bad) you will get an error:
> >error: The following untracked working tree files would be overwritten by checkout:
> >Foo.php
> 
> Anything else you want to add:
> git bisect good/bad needs to have support for a "--force" flag, which
> is passed to the git checkout it runs internally
> At the moment git bisect cannot be used on Windows, as there is no way
> to continue the bisect from here.
> Changing the "git config core.ignorecase true" temporarily is not an
> option, as this will introduce a variety of other bugs,
> which, on Windows, eventually will require you to completely delete
> and reclone the repo, as Windows file paths are case-insensitive

Could you share what those problems are?  `core.ignorecase` is
specifically designed to deal with case-insensitive file systems, and
that's why Git sets it to true.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH] t1006: add tests for %(objectsize:disk)
From: René Scharfe @ 2023-12-21 12:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ondrej Pohorelsky, brian m . carlson, Junio C Hamano
In-Reply-To: <20231221094722.GA570888@coredump.intra.peff.net>

Am 21.12.23 um 10:47 schrieb Jeff King:
> On Tue, Dec 19, 2023 at 05:42:39PM +0100, René Scharfe wrote:
>
>>> This adds a packed-object function, but I doubt anybody actually calls
>>> it. If we're going to do that, it's probably worth adding some tests for
>>> "cat-file --batch-check" or similar.
>>
>> Yes, and I was assuming that someone else would be eager to add such
>> tests. *ahem*
>
> :P OK, here it is. This can be its own topic, or go on top of the
> rs/t6300-compressed-size-fix branch.

Great, thank you!

> -- >8 --
> Subject: [PATCH] t1006: add tests for %(objectsize:disk)
>
> Back when we added this placeholder in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), there were no tests,
> claiming "[...]the exact numbers returned are volatile and subject to
> zlib and packing decisions".
>
> But we can use a little shell hackery to get the expected numbers
> ourselves. To a certain degree this is just re-implementing what Git is
> doing under the hood, but it is still worth doing. It makes sure we
> exercise the %(objectsize:disk) code at all, and having the two
> implementations agree gives us more confidence.
>
> Note that our shell code assumes that no object appears twice (either in
> two packs, or as both loose and packed), as then the results really are
> undefined. That's OK for our purposes, and the test will notice if that
> assumption is violated (the shell version would produce duplicate lines
> that Git's output does not have).
>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I stole a bit of your awk. You can tell because I'd have written it in
> perl. ;)

I think we can do it even in shell, especially if...

>
>  t/t1006-cat-file.sh | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 271c5e4fd3..21915be308 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1100,6 +1100,40 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
>  	cmp expect actual
>  '
>
> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> +	# our state has both loose and packed objects,
> +	# so find both for our expected output
> +	{
> +		find .git/objects/?? -type f |
> +		awk -F/ "{ print \$0, \$3\$4 }" |
> +		while read path oid
> +		do
> +			size=$(test_file_size "$path") &&
> +			echo "$oid $size" ||
> +			return 1
> +		done &&
> +		rawsz=$(test_oid rawsz) &&
> +		find .git/objects/pack -name "*.idx" |
> +		while read idx
> +		do
> +			git show-index <"$idx" >idx.raw &&
> +			sort -n <idx.raw >idx.sorted &&
> +			packsz=$(test_file_size "${idx%.idx}.pack") &&
> +			end=$((packsz - rawsz)) &&
> +			awk -v end="$end" "
> +			  NR > 1 { print oid, \$1 - start }
> +			  { start = \$1; oid = \$2 }
> +			  END { print oid, end - start }
> +			" idx.sorted ||

... we stop slicing the data against the grain.  Let's reverse the order
(sort -r), then we don't need to carry the oid forward:

			sort -nr <idx.raw >idx.sorted &&
			packsz=$(test_file_size "${idx%.idx}.pack") &&
			end=$((packsz - rawsz)) &&
			awk -v end="$end" "
			  { print \$2, end - \$1; end = \$1 }
			" idx.sorted ||

And at that point it should be easy to use a shell loop instead of awk:

			while read start oid rest
			do
				size=$((end - start)) &&
				end=$start &&
				echo "$oid $size" ||
				return 1
			done <idx.sorted

> +			return 1
> +		done
> +	} >expect.raw &&
> +	sort <expect.raw >expect &&

The reversal above becomes irrelevant with that line, so the result in
expect stays the same.

Should we deduplicate here, like cat-file does (i.e. use "sort -u")?
Having the same object in multiple places for whatever reason would not
be a cause for reporting an error in this test, I would think.

> +	git cat-file --batch-all-objects \
> +		--batch-check="%(objectname) %(objectsize:disk)" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'set up replacement object' '
>  	orig=$(git rev-parse HEAD) &&
>  	git cat-file commit $orig >orig &&

One more thing: We can do the work of the first awk invocation in the
already existing loop as well:

> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> +	# our state has both loose and packed objects,
> +	# so find both for our expected output
> +	{
> +		find .git/objects/?? -type f |
> +		awk -F/ "{ print \$0, \$3\$4 }" |
> +		while read path oid
> +		do
> +			size=$(test_file_size "$path") &&
> +			echo "$oid $size" ||
> +			return 1
> +		done &&

... but the substitutions are a bit awkward:

		find .git/objects/?? -type f |
		while read path
		do
			basename=${path##*/} &&
			dirname=${path%/$basename} &&
			oid="${dirname#.git/objects/}${basename}" &&
			size=$(test_file_size "$path") &&
			echo "$oid $size" ||
			return 1
		done &&

The avoided awk invocation might be worth the trouble, though.

René

^ permalink raw reply

* Re: Git mirror at gitlab
From: Patrick Steinhardt @ 2023-12-21 11:48 UTC (permalink / raw)
  To: Olliver Schinagl
  Cc: git, gitster, Ævar Arnfjörð Bjarmason, psteinhardt
In-Reply-To: <2a833bfc-a075-4e78-ae6c-270f5198d498@schinagl.nl>

[-- Attachment #1: Type: text/plain, Size: 3239 bytes --]

On Thu, Dec 21, 2023 at 12:30:02PM +0100, Olliver Schinagl wrote:
> Hey list,
> 
> For years, I wanted (tried, but time) to run the mirror for git on gitlab.
> Actually, the original idea was to run a docker container ([0] 10k+ pulls
> :p)
> 
> I initially set this up via docker build containers, where docker hub would
> pull my mirror of the git repo. My mirror, because I added a Dockerfile
> which was enough for docker to do its trick. I was planning (time ..) on
> submitting this upstream to the list, but never did. Because of me not doing
> that, I had to manually (I was even too lazy to script it) rebase the
> branch. Docker then did some changes to their business, where the docker
> builds where not possible anymore.
> 
> So then I figured, I'll do the same on gitlab and push it to the docker hub.
> Thus I setup a mirror on gitlab [1], with the idea to work there on it.
> 
> Again, I never got around to finalize this work, mostly because the docker
> container 'just worked' for pretty much everything. After all, git is very
> stable overal.
> 
> So very interestingly, last month commit 0e3b67e2aa25edb ("ci: add support
> for GitLab CI") landed, which started to trigger pipeline jobs!
> 
> Sadly, this only worked for 3 builds, as that's when the minutes ran out :)
> 
> So one, I would very much like to offer the registered names (cause they are
> pretty nice in name) to here, so people can use and find it.

Not to throw a wrench into this, but are you aware of the official
GitLab mirror at https://gitlab.com/git-vcs/git? I myself wasn't aware
of this mirror for a rather long time.

I also wondered whether we want to have https://gitlab.com/git/git as we
do on GitHub. I don't think anybody registered it, but it is blocked
from being registered as far as I can tell. Maybe we block the namespace
out of caution, I dunno. I can certainly check in with our SREs in case
it is something the Git project would like to own.

> Two, hopefully get Patrick Steinhardt to help out to get unlimited minutes
> and storage on the repo :)

I'm sure we can do something here, but I'd rather aim to do this for the
official mirror which currently is the one I mentioned above. If the
project is interested in running builds on GitLab then I'm happy to
coordinate.

Also Cc Ævar, who is the current owner of the mirror. Would it be
possible to add myself as a second owner to the project? This might help
setting up the CI infrastructure. But please, if anybody disagrees with
me being added as an owner here I encourage you to say so.

> Three, see what the opinion of people here is on this. I'll do the work to
> get the dockerfile (now containerfile, we're inclusive after all) merged,
> and the CI file updated to create, store (and push to docker hub) the
> generated containers.

I don't really have much of an opinion here and will leave it to others
to discuss.

Thanks!

Patrick

PS: As most other folks I'll be OOO during holidays, so I may only
    answer sporadically.

> Thanks,
> Olliver
> 
> [0]: https://hub.docker.com/r/gitscm/git
> [1]: https://gitlab.com/gitscm/git
> 
> P.S. I'm not subscribed, so please keep me in the CC :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Git mirror at gitlab
From: Olliver Schinagl @ 2023-12-21 11:30 UTC (permalink / raw)
  To: git, ps, gitster

Hey list,

For years, I wanted (tried, but time) to run the mirror for git on 
gitlab. Actually, the original idea was to run a docker container ([0] 
10k+ pulls :p)

I initially set this up via docker build containers, where docker hub 
would pull my mirror of the git repo. My mirror, because I added a 
Dockerfile which was enough for docker to do its trick. I was planning 
(time ..) on submitting this upstream to the list, but never did. 
Because of me not doing that, I had to manually (I was even too lazy to 
script it) rebase the branch. Docker then did some changes to their 
business, where the docker builds where not possible anymore.

So then I figured, I'll do the same on gitlab and push it to the docker 
hub. Thus I setup a mirror on gitlab [1], with the idea to work there on it.

Again, I never got around to finalize this work, mostly because the 
docker container 'just worked' for pretty much everything. After all, 
git is very stable overal.

So very interestingly, last month commit 0e3b67e2aa25edb ("ci: add 
support for GitLab CI") landed, which started to trigger pipeline jobs!

Sadly, this only worked for 3 builds, as that's when the minutes ran out :)

So one, I would very much like to offer the registered names (cause they 
are pretty nice in name) to here, so people can use and find it.

Two, hopefully get Patrick Steinhardt to help out to get unlimited 
minutes and storage on the repo :)

Three, see what the opinion of people here is on this. I'll do the work 
to get the dockerfile (now containerfile, we're inclusive after all) 
merged, and the CI file updated to create, store (and push to docker 
hub) the generated containers.

Thanks,
Olliver

[0]: https://hub.docker.com/r/gitscm/git
[1]: https://gitlab.com/gitscm/git

P.S. I'm not subscribed, so please keep me in the CC :)

^ permalink raw reply

* Re: [PATCH 00/24] pack-objects: multi-pack verbatim reuse
From: Jeff King @ 2023-12-21 11:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <ZXxy1USjjjAbBi++@nand.local>

On Fri, Dec 15, 2023 at 10:37:57AM -0500, Taylor Blau wrote:

> On Tue, Dec 12, 2023 at 03:12:38AM -0500, Jeff King wrote:
> > So my question is: how much of what you're seeing is from (1) and (2),
> > and how much is from (3)? Because there are other ways to trigger (3),
> > such as lowering the window size. For example, if you try your same
> > packing example with --window=0, how do the CPU and output size compare
> > to the results of your series? (I'd also check peak memory usage).
> 
> Interesting question! Here are some preliminary numbers on my machine
> (which runs Debian unstable with a Intel Xenon W-2255 CPU @ 3.70GHz and
> 64GB of RAM).
> 
> I ran the following hyperfine command on my testing repository, which
> has the Git repository broken up into ~75 packs or so:

Thanks for running these tests. The results are similar to what
expected, which is: yes, most of your CPU savings are from skipping
deltas, but not all.

Here's what I see (which I think is mostly redundant with what you've
said, but I just want to lay out my line of thinking). I'll reorder your
quoted sections a bit as I go:

>     Benchmark 2: multi-pack reuse, pack.window=0
>     [...]
>       Time (mean ± σ):      1.075 s ±  0.005 s    [User: 0.990 s, System: 0.188 s]
>       Range (min … max):    1.071 s …  1.088 s    10 runs
>
>     Benchmark 4: multi-pack reuse, pack.window=10
>     [...]
>       Time (mean ± σ):      1.028 s ±  0.002 s    [User: 1.150 s, System: 0.184 s]
>       Range (min … max):    1.026 s …  1.032 s    10 runs

OK, so when we're doing more full ("multi") reuse, the pack window
doesn't make a big difference either way. You didn't show the stderr
from each, but presumably most of the objects are hitting the "reuse"
path, and only a few are deltas (and that is backed up by the fact that
doing deltas only gives us a slight improvement in the output size:

>     Benchmark 2: multi-pack reuse, pack.window=0
>     268.670 MB
>     Benchmark 4: multi-pack reuse, pack.window=10
>     266.473 MB

Comparing the runs with less reuse:

>     Benchmark 1: single-pack reuse, pack.window=0
>     [...]
>       Time (mean ± σ):      1.248 s ±  0.004 s    [User: 1.160 s, System: 0.188 s]
>       Range (min … max):    1.244 s …  1.259 s    10 runs
>
>     Benchmark 3: single-pack reuse, pack.window=10
>     [...]
>       Time (mean ± σ):      6.281 s ±  0.024 s    [User: 43.727 s, System: 0.492 s]
>       Range (min … max):    6.252 s …  6.326 s    10 runs

there obviously is a huge amount of time saved by not doing deltas, but
we pay for it with a much bigger pack:

>     Benchmark 1: single-pack reuse, pack.window=0
>     264.443 MB
>     Benchmark 3: single-pack reuse, pack.window=10
>     194.355 MB

But of course that "much bigger" pack is about the same size as the one
we get from doing multi-pack reuse. Which is not surprising, because
both are avoiding looking for new deltas (and the packs after the
preferred one probably have mediocre deltas).

So I do actually think that disabling pack.window gives you a
similar-ish tradeoff to expanding the pack-reuse code (~6s down to ~1s,
and a 36% embiggening of the resulting pack size).

Which implies that one option is to scrap your entire series and just
set pack.window. Basically comparing multi/10 (your patches) to single/0
(a hypothetical config option), which have similar run-times and pack
sizes.

But that's not quite the whole story. There is still a CPU improvement
in your series (1.2s vs 1.0s, a 20% speedup). And as I'd expect, a
memory improvement from avoiding the extra book-keeping (almost 10%):

>     Benchmark 1: single-pack reuse, pack.window=0
>     354.224 MB (max RSS)
>     Benchmark 4: multi-pack reuse, pack.window=10
>     328.786 MB (max RSS)

So while it's a lot less code to just set the window size, I do think
those improvements are worth it. And really, it's the same tradeoff we
make for the single-pack case (i.e., one could argue that we
could/should rip out the verbatim-reuse code entirely in favor of just
tweaking the window size).

> It's pretty close between multi-pack reuse with a window size of 0 and
> a window size of 10. If you want to optimize for pack size, you could
> trade a ~4% reduction in pack size for a ~1% increase in peak memory
> usage.

I think if you want to optimize for pack size, you should consider
repacking all-into-one to get better on-disk deltas. ;) I know that's
easier said than done when the I/O costs are significant. I do wonder if
storing thin packs on disk would let us more cheaply reach a state that
could serve optimal-ish packs without spending CPU computing bespoke
deltas for each client. But that's a much larger topic.

-Peff

^ permalink raw reply

* Re: [PATCH 0/8] reftable: small set of fixes
From: Han-Wen Nienhuys @ 2023-12-21 11:08 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <cover.1700549493.git.ps@pks.im>

On Tue, Nov 21, 2023 at 8:07 AM Patrick Steinhardt <ps@pks.im> wrote:
> It's a bit unfortunate that we don't yet have good test coverage as
> there are no end-to-end tests, and most of the changes I did are not
> easily testable in unit tests. So until the reftable backend gets
..
>   reftable/stack: perform auto-compaction with transactional interface

you can test autocompaction more precisely using the stats
in reftable_compaction_stats. See
https://github.com/hanwen/reftable/blob/master/stack_test.go#L126
for an example.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply

* Re: [PATCH v2 06/11] reftable/stack: reuse buffers when reloading stack
From: Han-Wen Nienhuys @ 2023-12-21 10:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Taylor Blau, git, Jonathan Nieder
In-Reply-To: <ZXbRmwj1vZ2dA3s9@tanuki>

On Mon, Dec 11, 2023 at 10:08 AM Patrick Steinhardt <ps@pks.im> wrote:

> > This initially caught me by surprise, but on closer inspection I agree
> > that this is OK, since stack_filename() calls strbuf_reset() before
> > adjusting the buffer contents.
> >
> > (As a side-note, I do find the side-effect of stack_filename() to be a
> > little surprising, but that's not the fault of this series and not worth
> > changing here.)
>
> Agreed, I also found this to be a bit confusing at first. I'll amend the
> commit message with "Note that we do not have to manually reset the
> buffer because `stack_filename()` does this for us already." to help
> future readers.

In C++ it is expected that assignment operators clear the destination
before executing the assignment, so it depends on your expectations.
If this is confusing, maybe another name is in order?

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply

* Re: [PATCH v2 00/26] pack-objects: multi-pack verbatim reuse
From: Jeff King @ 2023-12-21 10:51 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git, Patrick Steinhardt
In-Reply-To: <ZXurD1NTZ4TAs7WZ@nand.local>

On Thu, Dec 14, 2023 at 08:25:35PM -0500, Taylor Blau wrote:

> On Thu, Dec 14, 2023 at 04:40:40PM -0800, Junio C Hamano wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> >
> > > I haven't looked into the details yet, but it seems that
> > > t5309-pack-delta-cycles.sh fails under
> > >
> > >     $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make -j16 test
> >
> > Hmph, this seems to be elusive.  I tried it again and then it did
> > not fail this time.
> 
> Indeed, but I was able to reproduce the failure both on my branch and on
> 'master' under --stress, yielding the following failure in t5309.6:

OK, so it's nothing new, and we can ignore it for your series (I haven't
seen it in the wild yet, but it is something we may need to deal with in
the long run if it keeps popping up).

>     + git index-pack --fix-thin --stdin
>     fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)
> 
>     =================================================================
>     ==3904583==ERROR: LeakSanitizer: detected memory leaks
> 
>     Direct leak of 32 byte(s) in 1 object(s) allocated from:
>         #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
>         #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
>         #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
>         #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
>         #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
>         #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
>         #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
>         #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> 
>     SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
>     Aborted
> 
> The duplicate base thing is a red-herring, and is an expected result of
> the test. But the leak is definitely not, and I'm not sure what's going
> on here since the frames listed above are in the LSan runtime, not Git.

I suspect this is a race in LSan caused by a thread calling exit() while
other threads are spawning. Here's my theory.

When a thread is spawned, LSan needs to know where its stack is (so it
can look for points to reachable memory). It calls pthread_getattr_np(),
which gets an attributes object that must be cleaned up with
pthread_attr_destroy(). Presumably it does this shortly after. But
there's a race window where that attr object is allocated and we haven't
yet set up the new thread's info. If another thread calls exit() then,
LSan will run but its book-keeping will be in an inconsistent state.

One way to work around that would be to make the creation of the threads
atomic. That is, create each in a suspended state, and only let them run
once they are all created. There's no option in pthreads to do this, but
we can simulate it by having them block on a mutex before starting. And
indeed, we already have such a lock: the work_lock() that they all use
to get data to process.

After applying this patch:

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dda94a9f46..0e94819216 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1257,13 +1257,15 @@ static void resolve_deltas(void)
 	base_cache_limit = delta_base_cache_limit * nr_threads;
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
 		init_thread();
+		work_lock();
 		for (i = 0; i < nr_threads; i++) {
 			int ret = pthread_create(&thread_data[i].thread, NULL,
 						 threaded_second_pass, thread_data + i);
 			if (ret)
 				die(_("unable to create thread: %s"),
 				    strerror(ret));
 		}
+		work_unlock();
 		for (i = 0; i < nr_threads; i++)
 			pthread_join(thread_data[i].thread, NULL);
 		cleanup_thread();

I ran t5309 with "--stress --run=6" for about 5 minutes with no failures
(whereas without the patch, I usually see a failure in 10 seconds or
so).

So it's a pretty easy fix, though I don't love it in general. Every
place that spawns multiple threads that can die() would need the same
treatment. And this isn't a "real" leak in any reasonable sense; it only
happens because we're exiting the program directly, at which point all
of the memory is returned to the OS anyway. So I hate changing
production code to satisfy a leak-checking false positive.

OTOH, dealing with false positives is annoying for humans, and the
run-time cost should be negligible. We can work around this one, and
avoid making the same change in other spots unless somebody sees a racy
failure in practice.

-Peff

^ permalink raw reply related

* Re: [PATCH v2 08/11] reftable/stack: fix use of unseeded randomness
From: Han-Wen Nienhuys @ 2023-12-21 10:49 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <bab4fb93df8d1a620eefeef99a49ea52c98dfc6e.1702047081.git.ps@pks.im>

On Fri, Dec 8, 2023 at 3:53 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> When writing a new reftable stack, Git will first create the stack with
> a random suffix so that concurrent updates will not try to write to the

I had noticed this already. Thanks for fixing!

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply

* Re: [PATCH 7/8] reftable/merged: reuse buffer to compute record keys
From: Han-Wen Nienhuys @ 2023-12-21 10:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <23c060d1e21573581ca6c5db50ca756b61078e3e.1700549493.git.ps@pks.im>

On Tue, Nov 21, 2023 at 8:07 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> When iterating over entries in the merged iterator's queue, we compute
> the key of each of the entries and write it into a buffer. We do not
> reuse the buffer though and thus re-allocate it on every iteration,
> which is wasteful given that we never transfer ownership of the
> allocated bytes outside of the loop.
>

From a brief read, change looks good.

In the C code, each key has to pass through the pqueue which is
(assuming auto-compaction) has log2(#tables) entries. The JGit code
assumes that the base table will be very large and the rest small.
This means that most keys come from the base table, and has some
special casing so those keys don't have to pass through the pqueue. If
you worry about efficiency, this might be something to look into.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply

* git bisect stuck - --force flag required for checkout
From: Devste Devste @ 2023-12-21 10:47 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
add file Foo.txt to .git and commit
add some commits with any changes to other files, as this is needed
for reproduction
run: git config core.ignorecase false
rename Foo.txt to foo.txt and commit
add some commits with any changes to other files, as this is needed
for reproduction
run: git bisect start && git bisect bad
eventually, when running "git bisect good" (or bad) you will get an error:
>error: The following untracked working tree files would be overwritten by checkout:
>Foo.php

Anything else you want to add:
git bisect good/bad needs to have support for a "--force" flag, which
is passed to the git checkout it runs internally
At the moment git bisect cannot be used on Windows, as there is no way
to continue the bisect from here.
Changing the "git config core.ignorecase true" temporarily is not an
option, as this will introduce a variety of other bugs,
which, on Windows, eventually will require you to completely delete
and reclone the repo, as Windows file paths are case-insensitive

[System Info]
git version:
git version 2.39.1.windows.1
cpu: i686
built from commit: b03dafd9c26b06c92d509a07ab01b01e6d0d85ee
sizeof-long: 4
sizeof-size_t: 4
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Windows 10.0
compiler info: gnuc: 12.2
libc info: no libc information available

[Enabled Hooks]

^ permalink raw reply

* Re: [PATCH 5/8] reftable/stack: perform auto-compaction with transactional interface
From: Patrick Steinhardt @ 2023-12-21 10:45 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git, Jonathan Nieder, psteinhardt
In-Reply-To: <CAOw_e7Yfdt_Wqm-9XDJknaN-iH=haP0R4K-S4c_E3EFDzvG5aA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 930 bytes --]

On Thu, Dec 21, 2023 at 11:29:52AM +0100, Han-Wen Nienhuys wrote:
> On Tue, Nov 21, 2023 at 8:07 AM Patrick Steinhardt <ps@pks.im> wrote:
> > Whenever updating references or reflog entries in the reftable stack, we
> > need to add a new table to the stack, thus growing the stack's length by
> ..
> 
> bug is correctly identified, but can't the fix remove the
> 
>          if (!st->disable_auto_compact)
>                 return reftable_stack_auto_compact(st);
> 
> fragment from reftable_stack_add? reftable_addition_commit eventually
> calls reftable_addition_commit.

"calls reftable_stack_add" you probably mean here. But yes, you're
right. As this patch series has already been merged to `next` I can roll
this cleanup into my second patch series that addresses issues with the
reftable library [1]. Does that work for you?

[1]: http://public-inbox.org/git/xmqqr0jgsn9g.fsf@gitster.g/T/#t

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 11/11] reftable/block: reuse buffer to compute record keys
From: Han-Wen Nienhuys @ 2023-12-21 10:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <02b11f3a80608ba8748a0d0e2294f432e02464e5.1702047081.git.ps@pks.im>

On Fri, Dec 8, 2023 at 3:53 PM Patrick Steinhardt <ps@pks.im> wrote:

> @@ -84,10 +84,12 @@ struct block_iter {
>
>         /* key for last entry we read. */
>         struct strbuf last_key;
> +       struct strbuf key;
>  };

it's slightly more efficient, but the new field has no essential
meaning. If I encountered this code with the change you make here, I
would probably refactor it in the opposite direction to increase code
clarity.

I suspect that the gains are too small to be measurable, but if you
are after small efficiency gains, you can have
reftable_record_decode() consume the key to avoid copying overhead in
record.c.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply

* Re: [PATCH 5/8] reftable/stack: perform auto-compaction with transactional interface
From: Han-Wen Nienhuys @ 2023-12-21 10:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <25522b042cdc5986972cc7b62e6b88be0569d3cb.1700549493.git.ps@pks.im>

On Tue, Nov 21, 2023 at 8:07 AM Patrick Steinhardt <ps@pks.im> wrote:
> Whenever updating references or reflog entries in the reftable stack, we
> need to add a new table to the stack, thus growing the stack's length by
..

bug is correctly identified, but can't the fix remove the

         if (!st->disable_auto_compact)
                return reftable_stack_auto_compact(st);

fragment from reftable_stack_add? reftable_addition_commit eventually
calls reftable_addition_commit.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply

* Re: [PATCH v3 0/4] refs: improve handling of special refs
From: Patrick Steinhardt @ 2023-12-21 10:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Phillip Wood, Ramsay Jones
In-Reply-To: <xmqqil4ssmfg.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 857 bytes --]

On Wed, Dec 20, 2023 at 11:28:51AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Patrick Steinhardt (4):
> >   wt-status: read HEAD and ORIG_HEAD via the refdb
> >   refs: propagate errno when reading special refs fails
> >   refs: complete list of special refs
> >   bisect: consistently write BISECT_EXPECTED_REV via the refdb
> 
> With the clear understanding that we plan to make those other than
> FETCH_HEAD and MERGE_HEAD in the is_special_ref().special_refs[]
> eventually not special at all, this round looked quite sensible to
> me.
> 
> Let's merge it down to 'next'.

Thanks. We (myself or a fellow team member at GitLab) will work on
converting the remaining not-so-special refs once this topic lands on
the `master` branch. Unless of course somebody else picks it up before
we do.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] git-compat-util: convert skip_{prefix,suffix}{,_mem} to bool
From: Jeff King @ 2023-12-21  9:59 UTC (permalink / raw)
  To: René Scharfe
  Cc: git, AtariDreams via GitGitGadget, Seija Kijin, Junio C Hamano,
	Phillip Wood
In-Reply-To: <2d30dc36-6091-4b47-846f-92d3f4a8b135@web.de>

On Sat, Dec 16, 2023 at 11:47:21AM +0100, René Scharfe wrote:

> Use the data type bool and its values true and false to document the
> binary return value of skip_prefix() and friends more explicitly.
> 
> This first use of stdbool.h, introduced with C99, is meant to check
> whether there are platforms that claim support for C99, as tested by
> 7bc341e21b (git-compat-util: add a test balloon for C99 support,
> 2021-12-01), but still lack that header for some reason.
> 
> A fallback based on a wider type, e.g. int, would have to deal with
> comparisons somehow to emulate that any non-zero value is true:
> 
>    bool b1 = 1;
>    bool b2 = 2;
>    if (b1 == b2) puts("This is true.");
> 
>    int i1 = 1;
>    int i2 = 2;
>    if (i1 == i2) puts("Not printed.");
>    #define BOOLEQ(a, b) (!(a) == !(b))
>    if (BOOLEQ(i1, i2)) puts("This is true.");
> 
> So we'd be better off using bool everywhere without a fallback, if
> possible.  That's why this patch doesn't include any.

Thanks for putting this together. I agree this is the right spot to end
up for now (and that if for whatever reason we find that some platforms
can't handle it, we probably should revert and not try the naive
fallback).

-Peff

^ permalink raw reply

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: Jeff King @ 2023-12-21  9:56 UTC (permalink / raw)
  To: phillip.wood
  Cc: René Scharfe, AtariDreams via GitGitGadget, git, Seija Kijin
In-Reply-To: <99b3a727-36fd-4fa5-a6be-60ae6fc5911e@gmail.com>

On Fri, Dec 15, 2023 at 02:46:36PM +0000, Phillip Wood wrote:

> > Yeah. b2() is wrong for passing "2" to a bool.
> 
> I think it depends what you mean by "wrong" §6.3.1.2 of standard is quite
> clear that when any non-zero scalar value is converted to _Bool the result
> is "1"

Yeah, sorry, I was being quite sloppy with my wording. I meant "wrong"
as in "I would ideally flag this in review for being weird and
confusing".

Of course there are many reasonable cases where you might pass an
integer "foo" rather than explicitly booleanizing it with "!!foo". So I
do agree it's a real potential problem (and I'm sufficiently convinced
that we should avoid an "int" fallback if we can).

-Peff

^ permalink raw reply

* [PATCH] t1006: add tests for %(objectsize:disk)
From: Jeff King @ 2023-12-21  9:47 UTC (permalink / raw)
  To: René Scharfe
  Cc: git, Ondrej Pohorelsky, brian m . carlson, Junio C Hamano
In-Reply-To: <6750c93c-78d0-46b5-bfc2-0774156ed2ed@web.de>

On Tue, Dec 19, 2023 at 05:42:39PM +0100, René Scharfe wrote:

> > This adds a packed-object function, but I doubt anybody actually calls
> > it. If we're going to do that, it's probably worth adding some tests for
> > "cat-file --batch-check" or similar.
> 
> Yes, and I was assuming that someone else would be eager to add such
> tests. *ahem*

:P OK, here it is. This can be its own topic, or go on top of the
rs/t6300-compressed-size-fix branch.

> > At which point I wonder if rather than having a function for a single
> > object, we are better off just testing the result of:
> >
> >   git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)'
> >
> > against a single post-processed "show-index" invocation.
> 
> Sure, we might want to optimize for bulk-processing and possibly end up
> only checking the size of single objects in t6300, making new library
> functions unnecessary.

So yeah, I think the approach here makes library functions unnecessary
(and I see you already asked Junio to drop your patch 2).

> When dumping size information of multiple objects, it's probably a good
> idea to include "%(objectname)" as well in the format.

Yep, definitely.

-- >8 --
Subject: [PATCH] t1006: add tests for %(objectsize:disk)

Back when we added this placeholder in a4ac106178 (cat-file: add
%(objectsize:disk) format atom, 2013-07-10), there were no tests,
claiming "[...]the exact numbers returned are volatile and subject to
zlib and packing decisions".

But we can use a little shell hackery to get the expected numbers
ourselves. To a certain degree this is just re-implementing what Git is
doing under the hood, but it is still worth doing. It makes sure we
exercise the %(objectsize:disk) code at all, and having the two
implementations agree gives us more confidence.

Note that our shell code assumes that no object appears twice (either in
two packs, or as both loose and packed), as then the results really are
undefined. That's OK for our purposes, and the test will notice if that
assumption is violated (the shell version would produce duplicate lines
that Git's output does not have).

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
I stole a bit of your awk. You can tell because I'd have written it in
perl. ;)

 t/t1006-cat-file.sh | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 271c5e4fd3..21915be308 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -1100,6 +1100,40 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
 	cmp expect actual
 '

+test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
+	# our state has both loose and packed objects,
+	# so find both for our expected output
+	{
+		find .git/objects/?? -type f |
+		awk -F/ "{ print \$0, \$3\$4 }" |
+		while read path oid
+		do
+			size=$(test_file_size "$path") &&
+			echo "$oid $size" ||
+			return 1
+		done &&
+		rawsz=$(test_oid rawsz) &&
+		find .git/objects/pack -name "*.idx" |
+		while read idx
+		do
+			git show-index <"$idx" >idx.raw &&
+			sort -n <idx.raw >idx.sorted &&
+			packsz=$(test_file_size "${idx%.idx}.pack") &&
+			end=$((packsz - rawsz)) &&
+			awk -v end="$end" "
+			  NR > 1 { print oid, \$1 - start }
+			  { start = \$1; oid = \$2 }
+			  END { print oid, end - start }
+			" idx.sorted ||
+			return 1
+		done
+	} >expect.raw &&
+	sort <expect.raw >expect &&
+	git cat-file --batch-all-objects \
+		--batch-check="%(objectname) %(objectsize:disk)" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up replacement object' '
 	orig=$(git rev-parse HEAD) &&
 	git cat-file commit $orig >orig &&
-- 
2.43.0.430.gaf21263e5d

^ permalink raw reply related

* Re: [RFC/PATCH] archive: "--list" does not take further options
From: Jeff King @ 2023-12-21  8:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git
In-Reply-To: <296e8d69-c1d7-4ad2-943a-dfc54940abc2@web.de>

On Thu, Dec 21, 2023 at 08:30:36AM +0100, René Scharfe wrote:

> >    I do not like the remote error behaviour this one adds at all.
> >    Do we use a more proper mechanism to propagate a remote error
> >    back for other subcommands we can reuse here?
> 
> Don't we have one?  It would affect other unsupported options as well,
> and this seems to work just fine, e.g.:
> 
>    $ git archive --remote=. --format=foo HEAD
>    remote: fatal: Unknown archive format 'foo'
>    remote: git upload-archive: archiver died with error
>    fatal: sent error to the client: git upload-archive: archiver died with error

Right. The whole idea of upload-archive is to spawn a separate writer
process and mux the conversation (including errors) back over the wire.
There are a zillion reasons it can die (including bad arguments) and we
catch and report them in the muxing process.

> >  	if (list) {
> > +		if (argc) {
> > +			if (!is_remote)
> > +				die(_("extra command line parameter '%s'"), *argv);
> > +			else
> > +				printf("!ERROR! extra command line parameter '%s'\n",
> > +				       *argv);
> > +		}
> 
> So just call die() here?

Yes, exactly.

-Peff

^ permalink raw reply

* Re: [RFC/PATCH] archive: "--list" does not take further options
From: Jeff King @ 2023-12-21  8:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe
In-Reply-To: <xmqqttocp98r.fsf@gitster.g>

On Wed, Dec 20, 2023 at 06:41:56PM -0800, Junio C Hamano wrote:

> ---
>  * This was done to convince myself that even though cmd_archive()
>    calls parse_options with PARSE_OPT_KEEP_UNKNOWN_OPT and then
>    uses the resulting argc/argv without apparent filtering of the
>    "--end-of-options" thing, it is correctly handling it, either
>    locally or remotely.
> 
>    - locally, write_archive() uses parse_archive_args(), which calls
>      parse_options() without KEEP_UNKNOWN_OPT and "--end-of-options"
>      is handled there just fine.

Right. Not only is it handled by the second parser, but _not_ keeping it
would be a bug, because that second parser would have no idea that we
saw end-of-options (and so "git archive --end-of-options --foo" would
try to treat "--foo" as an option).

The recent commit 9385174627 did fix a case like this for fast-export,
but git-archive was not changed. It passed KEEP_DASHDASH along with
KEEP_UNKNOWN_OPT, so it already retained and passed along
--end-of-options.

>    - remotely, run_remote_archiver() relays the local parameters,
>      including "--end-of-options" via the "argument" packet.  Then
>      the arguments are assembled into a strvec and used by the
>      upload-archive running on the other side to spawn an
>      upload-archive--writer process with.
>      cmd_upload_archive_writer() then makes the same write_archive()
>      call; "--end-of-options" would still be in argv[] if the
>      original "git archive --remote" invocation had one, but it is
>      consumed the same way as the local case in write_archive() by
>      calling parse_archive_args().

Right, and this is just the same case but with a lot of complicated
network-ferrying in between. :) We must retain --end-of-options so that
the next parser knows to stop treating them as arguments. And because it
doesn't use KEEP_UNKNOWN_OPT, the ferried "--end-of-options" is removed
then.

> diff --git c/archive.c w/archive.c
> index 9aeaf2bd87..3244e9f9f2 100644
> --- c/archive.c
> +++ w/archive.c
> @@ -641,6 +641,13 @@ static int parse_archive_args(int argc, const char **argv,
>  		base = "";
>  
>  	if (list) {
> +		if (argc) {
> +			if (!is_remote)
> +				die(_("extra command line parameter '%s'"), *argv);
> +			else
> +				printf("!ERROR! extra command line parameter '%s'\n",
> +				       *argv);
> +		}

This general direction seems reasonable to me, since we're letting the
user know that their extra argument was ignored (though note that if it
was a misspelled option, like "--otuput", we would complain about that).
It's largely orthogonal to end-of-options, but I see how you got here by
wondering about it. :)

-Peff

^ permalink raw reply

* Re: [PATCH/RFC] sparse-checkout: take care of "--end-of-options" in set/add
From: Jeff King @ 2023-12-21  8:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git
In-Reply-To: <xmqqplz0p90k.fsf@gitster.g>

On Wed, Dec 20, 2023 at 06:46:51PM -0800, Junio C Hamano wrote:

> Josh Steadmon <steadmon@google.com> writes:
> 
> > I can confirm that this fixes an issue noticed by sparse-checkout users
> > at $DAYJOB. Looks good to me. Thanks!
> 
> Heh, there is another one that is not converted in the same file for
> "check-rules" subcommand, so the posted patch is way incomplete, I
> think.

Yeah. I think it is in the same boat as the other two, in that I believe
that the KEEP_UNKNOWN_OPT flag is counter-productive and should just be
dropped.

-Peff

^ permalink raw reply

* Re: [PATCH/RFC] sparse-checkout: take care of "--end-of-options" in set/add
From: Jeff King @ 2023-12-21  8:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git, Josh Steadmon
In-Reply-To: <xmqqbkakqx6s.fsf@gitster.g>

On Wed, Dec 20, 2023 at 03:19:23PM -0800, Junio C Hamano wrote:

> 93851746 (parse-options: decouple "--end-of-options" and "--",
> 2023-12-06) updated the world order to make callers of parse-options
> that set PARSE_OPT_KEEP_UNKNOWN_OPT responsible for deciding what to
> do with "--end-of-options" they may see after parse_options() returns.
> 
> This unfortunately broke "sparse-checkout set/add", and from this
> invocation,
> 
>   "git sparse-checkout [add|set] --[no-]cone --end-of-options pattern..."
> 
> we now see "--end-of-options" listed in .git/info/sparse-checkout as if
> it is one of the path patterns.

Thanks for finding this. Though I only half-agree with the
"unfortunately" part. ;)

As argued in 93851746, any caller passing KEEP_UNKNOWN_OPT but _not_
handling the now-returned end-of-options is rather suspicious. Because
if they are planning to parse those options further, they will have no
idea that we saw --end-of-options, and will err in the unsafe direction
of still treating any elements with dashes as options.

Which would mean that simply dropping --end-of-options from the list, as
your patch does, would be similarly unsafe. It is papering over the
problem of:

  git sparse-checkout --end-of-options foo

but leaving:

  git sparse-checkout --end-of-options --foo

broken.

But the plot thickens! Curiously, in both of these cases, we do not do
any further parsing of the options at all. We just treat them as
pattern arguments with no special meaning.

So your patch is actually OK, but I would argue that the correct fix
here is to drop the use of PARSE_OPT_KEEP_UNKNOWN_OPT at all. I cannot
find any indication in the history on why it was ever used. It was added
when the command was converted to parse-options in 7bffca95ea
(sparse-checkout: add '--stdin' option to set subcommand, 2019-11-21).
Back then it was just called KEEP_UNKNOWN. Later it was renamed to
KEEP_UNKNOWN_OPT in 99d86d60e5 (parse-options: PARSE_OPT_KEEP_UNKNOWN
only applies to --options, 2022-08-19) to clarify that it was only about
dashed options; we always keep non-option arguments.

So looking at that original patch, it makes me think that the author was
confused about the mis-named option, and really just wanted to keep the
non-option arguments. We never should have used the flag all along (and
the other cases were cargo-culted within the file).

There is one minor gotcha, though. Unlike most other Git commands,
sparse-checkout will happily accept random option names and treat them
as non-option arguments. E.g. this works:

  git sparse-checkout add --foo --bar

and will add "--foo" and "--bar" as patterns. If we remove the flag,
we'd be breaking that. But I'd argue that anybody relying on that is
asking for trouble. For example, this does not work in the same way:

  git sparse-checkout add --skip-checks --foo

because "--skip-checks" is a real option. Ditto for any other options,
including those we add in the future. If you don't trust the contents of
your arguments, you should be using "--" or "--end-of-options" to
communicate the intent to the command.

-Peff

^ permalink raw reply

* Re: [RFC/PATCH] archive: "--list" does not take further options
From: René Scharfe @ 2023-12-21  7:30 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Jeff King
In-Reply-To: <xmqqttocp98r.fsf@gitster.g>

Am 21.12.23 um 03:41 schrieb Junio C Hamano:
> "git archive --list blah" should notice an extra command line
> parameter that goes unused.  Make it so.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  * This was done to convince myself that even though cmd_archive()
>    calls parse_options with PARSE_OPT_KEEP_UNKNOWN_OPT and then
>    uses the resulting argc/argv without apparent filtering of the
>    "--end-of-options" thing, it is correctly handling it, either
>    locally or remotely.
>
>    - locally, write_archive() uses parse_archive_args(), which calls
>      parse_options() without KEEP_UNKNOWN_OPT and "--end-of-options"
>      is handled there just fine.
>
>    - remotely, run_remote_archiver() relays the local parameters,
>      including "--end-of-options" via the "argument" packet.  Then
>      the arguments are assembled into a strvec and used by the
>      upload-archive running on the other side to spawn an
>      upload-archive--writer process with.
>      cmd_upload_archive_writer() then makes the same write_archive()
>      call; "--end-of-options" would still be in argv[] if the
>      original "git archive --remote" invocation had one, but it is
>      consumed the same way as the local case in write_archive() by
>      calling parse_archive_args().
>
>    I do not like the remote error behaviour this one adds at all.
>    Do we use a more proper mechanism to propagate a remote error
>    back for other subcommands we can reuse here?

Don't we have one?  It would affect other unsupported options as well,
and this seems to work just fine, e.g.:

   $ git archive --remote=. --format=foo HEAD
   remote: fatal: Unknown archive format 'foo'
   remote: git upload-archive: archiver died with error
   fatal: sent error to the client: git upload-archive: archiver died with error

>
>  archive.c           |  7 +++++++
>  t/t5000-tar-tree.sh | 14 ++++++++++++++
>  2 files changed, 21 insertions(+)
>
> diff --git c/archive.c w/archive.c
> index 9aeaf2bd87..3244e9f9f2 100644
> --- c/archive.c
> +++ w/archive.c
> @@ -641,6 +641,13 @@ static int parse_archive_args(int argc, const char **argv,
>  		base = "";
>
>  	if (list) {
> +		if (argc) {
> +			if (!is_remote)
> +				die(_("extra command line parameter '%s'"), *argv);
> +			else
> +				printf("!ERROR! extra command line parameter '%s'\n",
> +				       *argv);
> +		}

So just call die() here?

>  		for (i = 0; i < nr_archivers; i++)
>  			if (!is_remote || archivers[i]->flags & ARCHIVER_REMOTE)
>  				printf("%s\n", archivers[i]->name);
> diff --git c/t/t5000-tar-tree.sh w/t/t5000-tar-tree.sh
> index 918a2fc7c6..04592f45b0 100755
> --- c/t/t5000-tar-tree.sh
> +++ w/t/t5000-tar-tree.sh
> @@ -124,6 +124,20 @@ test_expect_success 'setup' '
>  	EOF
>  '
>
> +test_expect_success '--list notices extra parameters' '
> +	test_must_fail git archive --list blah &&
> +	# NEEDSWORK: remote error does not result in non-zero
> +	# exit, which we might want to change later.
> +	git archive --remote=. --list blah >remote-out &&
> +	grep "!ERROR! " remote-out

... and use test_must_fail in both cases?

> +'
> +
> +test_expect_success 'end-of-options is correctly eaten' '
> +	git archive --list --end-of-options &&
> +	git archive --remote=. --list --end-of-options >remote-out &&
> +	! grep "!ERROR! " remote-out
> +'
> +
>  test_expect_success 'populate workdir' '
>  	mkdir a &&
>  	echo simple textfile >a/a &&

^ permalink raw reply

* [PATCH 1/3] sparse-checkout: take care of "--end-of-options" in set/add/check-rules
From: Junio C Hamano @ 2023-12-21  6:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Josh Steadmon
In-Reply-To: <20231221065925.3234048-1-gitster@pobox.com>

93851746 (parse-options: decouple "--end-of-options" and "--",
2023-12-06) updated the world order to make callers of parse-options
that set PARSE_OPT_KEEP_UNKNOWN_OPT responsible for deciding what to
do with "--end-of-options" they may see after parse_options() returns.

This unfortunately broke "sparse-checkout set/add", and from this
invocation,

  "git sparse-checkout [add|set] --[no-]cone --end-of-options pattern..."

we now see "--end-of-options" listed in .git/info/sparse-checkout as if
it is one of the path patterns.

A breakage that results from the same cause exists in the check-rules
subcommand, but check-rules has a few other problems that need to be
corrected before it can fully work with --end-of-options safely,
which will be addressed later.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/sparse-checkout.c          | 3 +++
 parse-options.c                    | 8 ++++++++
 parse-options.h                    | 2 ++
 t/t1090-sparse-checkout-scope.sh   | 8 ++++++++
 t/t1091-sparse-checkout-builtin.sh | 2 +-
 5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 5c8ffb1f75..8f55127202 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -779,6 +779,7 @@ static int sparse_checkout_add(int argc, const char **argv, const char *prefix)
 			     builtin_sparse_checkout_add_options,
 			     builtin_sparse_checkout_add_usage,
 			     PARSE_OPT_KEEP_UNKNOWN_OPT);
+	parse_opt_skip_end_of_options(&argc, &argv);
 
 	sanitize_paths(argc, argv, prefix, add_opts.skip_checks);
 
@@ -826,6 +827,7 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix)
 			     builtin_sparse_checkout_set_options,
 			     builtin_sparse_checkout_set_usage,
 			     PARSE_OPT_KEEP_UNKNOWN_OPT);
+	parse_opt_skip_end_of_options(&argc, &argv);
 
 	if (update_modes(&set_opts.cone_mode, &set_opts.sparse_index))
 		return 1;
@@ -998,6 +1000,7 @@ static int sparse_checkout_check_rules(int argc, const char **argv, const char *
 			     builtin_sparse_checkout_check_rules_options,
 			     builtin_sparse_checkout_check_rules_usage,
 			     PARSE_OPT_KEEP_UNKNOWN_OPT);
+	parse_opt_skip_end_of_options(&argc, &argv);
 
 	if (check_rules_opts.rules_file && check_rules_opts.cone_mode < 0)
 		check_rules_opts.cone_mode = 1;
diff --git a/parse-options.c b/parse-options.c
index d50962062e..fe265bbf68 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -1321,3 +1321,11 @@ void die_for_incompatible_opt4(int opt1, const char *opt1_name,
 		break;
 	}
 }
+
+void parse_opt_skip_end_of_options(int *argc, const char ***argv)
+{
+	if (*argc && !strcmp(**argv, "--end-of-options")) {
+		(*argc)--;
+		(*argv)++;
+	}
+}
diff --git a/parse-options.h b/parse-options.h
index bd62e20268..0d3354d4a8 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -498,6 +498,8 @@ int parse_opt_passthru_argv(const struct option *, const char *, int);
 /* value is enum branch_track* */
 int parse_opt_tracking_mode(const struct option *, const char *, int);
 
+void parse_opt_skip_end_of_options(int *argc, const char ***argv);
+
 #define OPT__VERBOSE(var, h)  OPT_COUNTUP('v', "verbose", (var), (h))
 #define OPT__QUIET(var, h)    OPT_COUNTUP('q', "quiet",   (var), (h))
 #define OPT__VERBOSITY(var) { \
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 3a14218b24..5b96716235 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -57,6 +57,14 @@ test_expect_success 'return to full checkout of main' '
 test_expect_success 'skip-worktree on files outside sparse patterns' '
 	git sparse-checkout disable &&
 	git sparse-checkout set --no-cone "a*" &&
+	cat .git/info/sparse-checkout >wo-eoo &&
+
+	git sparse-checkout disable &&
+	git sparse-checkout set --no-cone --end-of-options "a*" &&
+	cat .git/info/sparse-checkout >w-eoo &&
+
+	test_cmp wo-eoo w-eoo &&
+
 	git checkout-index --all --ignore-skip-worktree-bits &&
 
 	git ls-files -t >output &&
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index f67611da28..e33a6ed1b4 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -334,7 +334,7 @@ test_expect_success 'cone mode: set with nested folders' '
 
 test_expect_success 'cone mode: add independent path' '
 	git -C repo sparse-checkout set deep/deeper1 &&
-	git -C repo sparse-checkout add folder1 &&
+	git -C repo sparse-checkout add --end-of-options folder1 &&
 	cat >expect <<-\EOF &&
 	/*
 	!/*/
-- 
2.43.0-174-g055bb6e996


^ permalink raw reply related

* [PATCH 3/3] sparse-checkout: tighten add_patterns_from_input()
From: Junio C Hamano @ 2023-12-21  6:59 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, William Sprent
In-Reply-To: <20231221065925.3234048-1-gitster@pobox.com>

The add_patterns_from_input() function was introduced at 6fb705ab
(sparse-checkout: extract add_patterns_from_input(), 2020-02-11) and
then modified by 00408ade (builtin/sparse-checkout: add check-rules
command, 2023-03-27).  Throughout its life, it either allowed to
read patterns from the file (before 00408ade, it only allowed the
standard input, after 00408ade, an arbitrary FILE *) or from argv[],
but never both.  However, when we read from a file, the function
never checked that there is nothing in argv[] (in other words, the
command line arguments have fully been consumed), resulting in
excess arguments silently getting ignored.

This caused commands like "git sparse-checkout set [--stdin]" and
"git sparse-checkout check-rules [--rules-file <file>]" to silently
ignore the rest of the command line arguments without reporting.

The fix finally makes the --end-of-options handling for this
subcommand also work, so add test for it, too.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/sparse-checkout.c          |  4 ++++
 t/t1091-sparse-checkout-builtin.sh | 11 ++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 04ae81fce8..1166e1e298 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -555,6 +555,10 @@ static void add_patterns_from_input(struct pattern_list *pl,
 				    FILE *file)
 {
 	int i;
+
+	if (file && argc)
+		die(_("excess command line parameter '%s'"), argv[0]);
+
 	if (core_sparse_checkout_cone) {
 		struct strbuf line = STRBUF_INIT;
 
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index e33a6ed1b4..107ed170ac 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -937,6 +937,10 @@ test_expect_success 'check-rules cone mode' '
 	EOF
 
 	git -C bare ls-tree -r --name-only HEAD >all-files &&
+
+	test_must_fail git -C bare sparse-checkout check-rules --cone \
+		--rules-file ../rules excess args <all-files &&
+
 	git -C bare sparse-checkout check-rules --cone \
 		--rules-file ../rules >check-rules-file <all-files &&
 
@@ -947,6 +951,7 @@ test_expect_success 'check-rules cone mode' '
 	git -C repo sparse-checkout check-rules >check-rules-default <all-files &&
 
 	test_grep "deep/deeper1/deepest/a" check-rules-file &&
+	test_grep ! "end-of-options" check-rules-file &&
 	test_grep ! "deep/deeper2" check-rules-file &&
 
 	test_cmp check-rules-file ls-files &&
@@ -959,8 +964,12 @@ test_expect_success 'check-rules non-cone mode' '
 	EOF
 
 	git -C bare ls-tree -r --name-only HEAD >all-files &&
+
+	test_must_fail git -C bare sparse-checkout check-rules --no-cone \
+		--rules-file ../rules excess args <all-files &&
+
 	git -C bare sparse-checkout check-rules --no-cone --rules-file ../rules\
-		>check-rules-file <all-files &&
+		--end-of-options >check-rules-file <all-files &&
 
 	cat rules | git -C repo sparse-checkout set --no-cone --stdin &&
 	git -C repo ls-files -t >out &&
-- 
2.43.0-174-g055bb6e996


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox