Git development
 help / color / mirror / Atom feed
* Re: [Suggestion] Documentation: quick-install-man not propagating DESTDIR
From: Junio C Hamano @ 2020-09-02 19:44 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: 'git'
In-Reply-To: <077a01d680af$2ad65510$8082ff30$@nexbridge.com>

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

> The make quick-install-man rule is not propagating DESTDIR when GNU Make
> 4.2.1 is used.

I wonder, instead of having to change all "$(MAKE) -C elsewhere", we
can add DESTDIR to the list of variables that are exported.

... goes and looks ...

Hmph, DESTDIR is exported together with DIFF, TAR, INSTALL and
SHELL_PATH.  We do rely on SHELL_PATH to be exported correctly to
t/Makefile for "make test" to work, so it is puzzling.

It is doubly puzzling that we use $(INSTALL) in Documentation/Makefile
on the same line as $(DESTDIR) is used, and apparently you are not
reporting problem on that one.

> It seems like a bit of a nit to report this, but I discovered that the
> installation is not putting the manuals in the same place as git. It’s a
> pretty simple fix. I can put a patch together if desired.

I do not think we want that patch.  Instead I think we'd want a
patch that uses the same trick as what makes INSTALL work.

Thanks.

> diff --git a/Makefile b/Makefile
> index 372139f1f2..dae2d99a7f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2992,10 +2992,10 @@ install-gitweb:
>         $(MAKE) -C gitweb install
>
> install-doc: install-man-perl
> -       $(MAKE) -C Documentation install
> +       $(MAKE) -C Documentation install DESTDIR=$(DESTDIR)
>
> install-man: install-man-perl
> -       $(MAKE) -C Documentation install-man
> +       $(MAKE) -C Documentation install-man DESTDIR=$(DESTDIR)
>
> install-man-perl: man-perl
>         $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(mandir_SQ)/man3'
> @@ -3006,19 +3006,19 @@ install-html:
>         $(MAKE) -C Documentation install-html
>
> install-info:
> -       $(MAKE) -C Documentation install-info
> +       $(MAKE) -C Documentation install-info DESTDIR=$(DESTDIR)
>
> install-pdf:
> -       $(MAKE) -C Documentation install-pdf
> +       $(MAKE) -C Documentation install-pdf DESTDIR=$(DESTDIR)
>
> quick-install-doc:
> -       $(MAKE) -C Documentation quick-install
> +       $(MAKE) -C Documentation quick-install DESTDIR=$(DESTDIR)
>
> quick-install-man:
> -       $(MAKE) -C Documentation quick-install-man
> +       $(MAKE) -C Documentation quick-install-man DESTDIR=$(DESTDIR)
>
> quick-install-html:
> -       $(MAKE) -C Documentation quick-install-html
> +       $(MAKE) -C Documentation quick-install-html DESTDIR=$(DESTDIR)
>
> -- Brief whoami:
> NonStop developer since approximately 211288444200000000
> UNIX developer since approximately 421664400
> -- In my real life, I talk too much.

^ permalink raw reply

* Re: Fastest way to set files date and time to latest commit time of each one
From: Ivan Baldo @ 2020-09-02 19:28 UTC (permalink / raw)
  To: Eric Wong; +Cc: git
In-Reply-To: <20200829044842.GA5732@dcvr>

  Hello everyone!
  I just now managed to get time to work again on this, sorry for
replying so late but wanted to do a single reply with the conclusion
if possible.
  Let me tell you that I feel very humbled by all your replies, thanks
a lot for your time and concern with my inquiry!
  Eric's script is not only in Debian but also in CentOS 7 (and I
guess Red Hat 7 too) in
/usr/share/doc/rsync-*/support/git-set-file-times.
  My use case is similar to his: a cluster of identically configured
web servers with autoscaling (tested up to 100 servers) which when
they boot (or there is a new version of any of the websites), rsync
the current version from another server.
  So currently if we build the system image of the web servers in
tandem with the central server everything works smoothly, the problem
is when we recreate from scratch any of the pre-saved images, in which
case we get the dates mismatch and unnecessary rsync checksumming when
put to production.
  Will use Eric's script from CentOS 7 as-is from now on, to avoid the
mismatch and mix pre-saved VM images without issues (slowness in
autoscaling).
  Thanks a lot to you all!
  Let me know if any of you comes to Uruguay, you got free beers here!
  Have a great day.


El sáb., 29 de ago. de 2020 a la(s) 01:48, Eric Wong (e@yhbt.net) escribió:
>
> Ivan Baldo <ibaldo@gmail.com> wrote:
> >   Hello.
> >   I know this is not standard usage of git, but I need a way to have
> > more stable dates and times in the files in order to avoid rsync
> > checksumming.
> >   So I found this
> > https://stackoverflow.com/questions/2179722/checking-out-old-file-with-original-create-modified-timestamps/2179876#2179876
> > and modified it a bit to run in CentOS 7:
> >
> > IFS="
> > "
> > for FILE in $(git ls-files -z | tr '\0' '\n')
> > do
> >     TIME=$(git log --pretty=format:%cd -n 1 --date=iso -- "$FILE")
> >     touch -c -m -d "$TIME" "$FILE"
> > done
> >
> >   Unfortunately it takes ages for a 84k files repo.
> >   I see the CPU usage is dominated by the git log command.
>
> running git log for each file isn't necessary.
>
> On Debian, rsync actually ships the `git-set-file-times' script
> in /usr/share/doc/rsync/scripts/ which only runs `git log' once
> and parses it.
>
> You can also get my (original) version from:
> https://yhbt.net/git-set-file-times
>
> >   I know a way I could use to split the work for all the CPU threads
> > but anyway, I would like to know if you guys and girls know of a
> > faster way to do this.
>
> Much of your overhead is going to be from process spawning.
> My Perl version reduces that significantly.
>
> I haven't tried it with 84K files, but it'll have to keep all
> those filenames in memory.  I'm not sure if parallelizing
> utime() syscalls is worth it, either; maybe it helps on SSD
> more than HDD.



-- 
Ivan Baldo - ibaldo@gmail.com - http://ibaldo.codigolibre.net/
Freelance C++/PHP programmer and GNU/Linux systems administrator.
The sky isn't the limit!

^ permalink raw reply

* Re: [PATCH 0/2] ci: avoid ugly "failure" annotation in the GitHub workflow
From: Junio C Hamano @ 2020-09-02 19:21 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin
In-Reply-To: <pull.719.git.1598991568.gitgitgadget@gmail.com>

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Whenever the GitHub workflow runs in a fork that does not contain the
> special-purpose ci-config branch, a big fat failure annotation greets the
> casual reader. See e.g. 
> https://github.com/gitgitgadget/git/actions/runs/233438295
>
> This is caused by the (non-fatal) failure to clone said branch. Let's avoid
> that. It's distracting.

Thanks.  Anything that reduces the CI noise is very much welcomed.


^ permalink raw reply

* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Junio C Hamano @ 2020-09-02 19:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, git
In-Reply-To: <20200902075439.GA855335@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> There are other variants, too:
>
>   - we could use malloc(1) versus xmalloc(0). Maybe more
>     readable/obvious? But also potentially allocates an extra byte when
>     the platform malloc(0) would not need to.
>
>   - we could return a non-NULL "ptr" without shrinking it at all (nor
>     allocating anything new). This is perfectly legal, and the
>     underlying realloc() would still know the original size if anybody
>     ever asked to grow it back again.
>
> I have to admit I don't overly care between them.

I don't either.  I admit that the latter I didn't think of---it
feels tricky and harder to reason about than any other variants.

> I suspect one of the
> reasons we never ran into this 15-year-old bug is that it's quite hard
> to convince Git to call realloc(0) in the first place. I only saw it
> when investigating a bug in another series, and there the problem turned
> out to be reading garbage bytes off the end of a buffer (which we
> interpreted as a serialized ewah bitmap which happened to have a zero in
> its length byte).

Thanks.


^ permalink raw reply

* Re: What's cooking in git.git (Sep 2020, #01; Tue, 1)
From: Junio C Hamano @ 2020-09-02 19:12 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List
In-Reply-To: <CAPig+cRn3e=A4Bd5h7+y9NgOws7Pif86vdW8M1RwQaP_q-Es5Q@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

> Okay. In that case, perhaps a slight modification of your wording would work?
>
>     "git worktree" gained a "repair" subcommand to help users recover
>     after moving the worktrees or repository manually without telling
>     Git.  Also, "git init --separate-git-dir" no longer corrupts
>     administrative data related to linked worktrees.

Perfect.

^ permalink raw reply

* [PATCH] fetch: no FETCH_HEAD display if --no-write-fetch-head
From: Jonathan Tan @ 2020-09-02 19:02 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, derrickstolee

887952b8c6 ("fetch: optionally allow disabling FETCH_HEAD update",
2020-08-18) introduced the ability to disable writing to FETCH_HEAD
during fetch, but did not suppress the "<source> -> FETCH_HEAD" message
when this ability is used. This message is misleading in this case,
because FETCH_HEAD is not written. Also, because "fetch" is used to
lazy-fetch missing objects in a partial clone, this significantly
clutters up the output in that case since the objects to be fetched are
potentially numerous. Therefore, suppress this message when
--no-write-fetch-head is passed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
This is on origin/jt/lazy-fetch.

This might be important for Stolee's maintenance prefetch patch [1] too
- presumably we don't want to show FETCH_HEAD there, as it would be
misleading and would clutter in the same way (albeit to a lesser
extent).

[1] https://lore.kernel.org/git/da64c51a8182ec13aeed8f0157079fb29a09ee85.1598380599.git.gitgitgadget@gmail.com/
---
 builtin/fetch.c          | 3 ++-
 t/t0410-partial-clone.sh | 7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 320ba9471d..442df05f5a 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1023,11 +1023,12 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				rc |= update_local_ref(ref, what, rm, &note,
 						       summary_width);
 				free(ref);
-			} else
+			} else if (write_fetch_head) {
 				format_display(&note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
+			}
 			if (note.len) {
 				if (verbosity >= 0 && !shown_url) {
 					fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index d681e90640..584a039b85 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -183,7 +183,7 @@ test_expect_success 'missing CLI object, but promised, passes fsck' '
 '
 
 test_expect_success 'fetching of missing objects' '
-	rm -rf repo &&
+	rm -rf repo err &&
 	test_create_repo server &&
 	test_commit -C server foo &&
 	git -C server repack -a -d --write-bitmap-index &&
@@ -194,7 +194,10 @@ test_expect_success 'fetching of missing objects' '
 
 	git -C repo config core.repositoryformatversion 1 &&
 	git -C repo config extensions.partialclone "origin" &&
-	git -C repo cat-file -p "$HASH" &&
+	git -C repo cat-file -p "$HASH" 2>err &&
+
+	# Ensure that no spurious FETCH_HEAD messages are written
+	! grep FETCH_HEAD err &&
 
 	# Ensure that the .promisor file is written, and check that its
 	# associated packfile contains the object
-- 
2.28.0.402.g5ffc5be6b7-goog


^ permalink raw reply related

* Re: What's cooking in git.git (Sep 2020, #01; Tue, 1)
From: Eric Sunshine @ 2020-09-02 18:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List
In-Reply-To: <xmqqwo1caw2y.fsf@gitster.c.googlers.com>

On Wed, Sep 2, 2020 at 2:10 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > The current implementation also helps out when the main worktree (or
> > bare repository) is moved.
>
> That is why I wrote "secondary worktrees" initially and then dropped
> "secondary" from the description ;-)

I understand now. Perhaps it's just me, but when I see the term
"worktree", I automatically think "linked worktre" (what you call
"secondary"), and don't usually think of the main worktree. But this
also fixes the problem when a bare repository is moved. So...

> Well given that the primary reason why I add these blub in What's
> cooking is to draft the release notes for upcoming release, my
> preference is to give "we help these cases" than giving overly large
> promises to our users.

Okay. In that case, perhaps a slight modification of your wording would work?

    "git worktree" gained a "repair" subcommand to help users recover
    after moving the worktrees or repository manually without telling
    Git.  Also, "git init --separate-git-dir" no longer corrupts
    administrative data related to linked worktrees.

^ permalink raw reply

* Re: What's cooking in git.git (Sep 2020, #01; Tue, 1)
From: Junio C Hamano @ 2020-09-02 18:10 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List
In-Reply-To: <CAPig+cTGspJAGxRu+vqdko1ntkBonVaoStYde3+P5UxPxrCs7g@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Sep 2, 2020 at 12:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Sunshine <sunshine@sunshineco.com> writes:
>> > I wonder if this could be reworded so it's clearer that "git worktree
>> > repair" is a new command, and to mention fixes to "git init
>> > --separate-git-dir". Perhaps like this?
>> >
>> >     "git worktree" gained a "repair" subcommand to help users recover
>> >     from problems arising from factors outside of Git's control.
>> >     Also, "git init --separate-git-dir" no longer corrupts
>> >     administrative data related to linked worktrees.
>>
>> OK that reads much better.
>>
>> -from problems arising from factors outside of Git's control.
>> +after moving the worktrees manually without telling Git.
>>
>> The latter is slightly shorter; does the "repair" help situations
>> other than that, or is the above cover all the "factors outside" out
>> control?
>
> The current implementation also helps out when the main worktree (or
> bare repository) is moved.

That is why I wrote "secondary worktrees" initially and then dropped
"secondary" from the description ;-)

> However, in the "git worktree repair" documentation, I intentionally
> avoided nailing down precisely the problems it repairs, instead
> leaving it open-ended since it may learn more repairs in the future.
> (The documentation is careful to say that it repairs "administrative
> files", and then talks about the currently-implemented repairs as
> _examples_ of what it might repair, without locking it into only those
> repairs.)
>
> I think the same generality of description can apply to the blurb
> here, as well. We don't necessarily need to give precise detail in
> this blurb -- the reader can learn the details by consulting the
> documentation.

Well given that the primary reason why I add these blub in What's
cooking is to draft the release notes for upcoming release, my
preference is to give "we help these cases" than giving overly large
promises to our users.

So...

^ permalink raw reply

* Re: [PATCH] RFC: refs: add GIT_DEBUG_REFS debugging mechanism
From: Jonathan Tan @ 2020-09-02 17:49 UTC (permalink / raw)
  To: gitgitgadget; +Cc: git, hanwenn, hanwen, git, stolee, Jonathan Tan
In-Reply-To: <pull.713.git.1598628529512.gitgitgadget@gmail.com>

> This should be integrated with the trace2 sub-system, and I would appreciate
> pointers on how to start.

The trace2 subsystem seems to be designed to detect errors coarsely
(e.g. by looking at process invocations) and log timings. It currently
doesn't seem to be made for this kind of fine debugging information -
and perhaps this is by design, because such logging information would
not be useful to most users and would just clutter up the logs.

But I think there is a place for this in Git - in particular, we have
packet tracing (GIT_TRACE_PACKET), and this has been useful both in
automated tests (t/????-*.sh) and in manual tests. Ref backend tracing
seems to be similar. But this would be best if we had an additional
option that could control whether ref backend tracing was on or off,
independent from other things like the trace2 target.

Is the plan to migrate all usages of "trace" to "trace2" or for both to
exist simultaneously? If the latter, then ref backend tracing could just
use "trace", but if the former, we'd have to designing this additional
option.

^ permalink raw reply

* Re: [PATCH v2] Makefile: add support for generating JSON compilation database
From: Junio C Hamano @ 2020-09-02 17:21 UTC (permalink / raw)
  To: Philippe Blain via GitGitGadget
  Cc: git, brian m. carlson, Philippe Blain, Jeff King
In-Reply-To: <pull.714.v2.git.1599001759548.gitgitgadget@gmail.com>

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/.gitignore b/.gitignore
> index ee509a2ad2..f4c51300e0 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -197,6 +197,7 @@
>  /git.spec
>  *.exe
>  *.[aos]
> +*.o.json

Does this naming/suffix follow the common practice followed among
those projects that use the -MJ option?  Even though I may have
heard of it, I am not familiar with the use of the feature at all,
and to such a person, naming a file after what format it is written
in (i.e. 'json') feels a lot less useful than what it contains
(i.e. compilation database entries).  

It's like naming our source files with .txt suffix ;-)

> +# Define GENERATE_COMPILATION_DATABASE to generate JSON compilation database
> +# entries during compilation if your compiler supports it, using the `-MJ` flag.
> +# The JSON entries will be placed in the `compile_commands/` directory,
> +# and the JSON compilation database 'compile_commands.json' will be created 
> +# at the root of the repository. 

Likewise for the name of the directory and the resulting single
database file.  I just want to make sure that we are following the
common convention, so people familiar with the use of the feature
would feel at home, so a simple answer "yes" would suffice.

> +ifdef GENERATE_COMPILATION_DATABASE
> +compdb_check = $(shell $(CC) $(ALL_CFLAGS) \
> +	-c -MJ /dev/null \
> +	-x c /dev/null -o /dev/null 2>&1; \
> +	echo $$?)
> +ifeq ($(compdb_check),0)
> +override GENERATE_COMPILATION_DATABASE = yes

This feels strange.  If the end user said to GENERATE and we find we
are capable, we still override to 'yes'?  What if the end user set
'no' to the GENERATE_COMPILATION_DATABASE macro?  Shouldn't we be
honoring that wish?

> +else
> +override GENERATE_COMPILATION_DATABASE = no
> +$(warning GENERATE_COMPILATION_DATABASE is set, but your compiler does not \
> +support generating compilation database entries)

This side is perfectly understandable and I think it is a valid use
of override.  But I do not understand the other side.

> @@ -2381,16 +2401,30 @@ missing_dep_dirs =
>  dep_args =
>  endif
>  
> +compdb_dir = compile_commands/
> +
> +ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
> +missing_compdb_dir = $(compdb_dir)
> +$(missing_compdb_dir):
> +	@mkdir -p $@
> +
> +compdb_file = $(compdb_dir)$(subst .-,,$(subst /,-,$(dir $@)))$(notdir $@).json

This detail does not matter as long as the end result ensures unique
output for all source files, but I am having trouble guessing what
the outermost subst, which removes ".-" sequence, is trying to make
prettier.  Care to explain?

> +compdb_args = -MJ $(compdb_file)
> +else
> +missing_compdb_dir =
> +compdb_args =

These are unfortunate.  I briefly wondered if we can make GIT-CFLAGS
depend on these two variables so that ASM_OBJ and C_OBJ do not have
to depend on them, but the actual rules need to be updated anyway,
so it cannot be helped.

I do wonder if we can unify dep_args and compdb_args into a single
extra_args (and similarly dep_dirs and compdb_dir to extra_dirs) so
that future similar options can all piggyback on it, but this can do
for now.

> @@ -2413,6 +2447,14 @@ else
>  $(OBJECTS): $(LIB_H) $(GENERATED_H)
>  endif
>  
> +ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
> +all:: compile_commands.json
> +compile_commands.json:
> +	@$(RM) $@
> +	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)*.o.json > $@+

OK.  The entire thing is concatenated and enclosed by a single pair
of '[' and ']'.

If we are planning to allow developers customize compdb_dir,
requiring trailing slash in the value of $(compdb_dir) is not very
friendly.


Thanks.

^ permalink raw reply

* Re: What's cooking in git.git (Sep 2020, #01; Tue, 1)
From: Eric Sunshine @ 2020-09-02 16:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List
In-Reply-To: <xmqq7dtcceka.fsf@gitster.c.googlers.com>

On Wed, Sep 2, 2020 at 12:46 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > I wonder if this could be reworded so it's clearer that "git worktree
> > repair" is a new command, and to mention fixes to "git init
> > --separate-git-dir". Perhaps like this?
> >
> >     "git worktree" gained a "repair" subcommand to help users recover
> >     from problems arising from factors outside of Git's control.
> >     Also, "git init --separate-git-dir" no longer corrupts
> >     administrative data related to linked worktrees.
>
> OK that reads much better.
>
> -from problems arising from factors outside of Git's control.
> +after moving the worktrees manually without telling Git.
>
> The latter is slightly shorter; does the "repair" help situations
> other than that, or is the above cover all the "factors outside" out
> control?

The current implementation also helps out when the main worktree (or
bare repository) is moved.

However, in the "git worktree repair" documentation, I intentionally
avoided nailing down precisely the problems it repairs, instead
leaving it open-ended since it may learn more repairs in the future.
(The documentation is careful to say that it repairs "administrative
files", and then talks about the currently-implemented repairs as
_examples_ of what it might repair, without locking it into only those
repairs.)

I think the same generality of description can apply to the blurb
here, as well. We don't necessarily need to give precise detail in
this blurb -- the reader can learn the details by consulting the
documentation.

^ permalink raw reply

* Re: What's cooking in git.git (Sep 2020, #01; Tue, 1)
From: Junio C Hamano @ 2020-09-02 16:45 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List
In-Reply-To: <CAPig+cTUhLoPvs+ygnc0Y4Ez3M3tfGncPzON0ejb=xEOMBixHQ@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Tue, Sep 1, 2020 at 5:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>> * jc/run-command-use-embedded-args (2020-08-26) 1 commit
>>  Various callers of run_command API has been modernized.
>
> s/has/have/
>
>> * es/worktree-repair (2020-08-31) 5 commits
>>   (merged to 'next' on 2020-08-31 at 604825c5e4)
>>  + init: make --separate-git-dir work from within linked worktree
>>  + init: teach --separate-git-dir to repair linked worktrees
>>  + worktree: teach "repair" to fix outgoing links to worktrees
>>  + worktree: teach "repair" to fix worktree back-links to main worktree
>>  + worktree: add skeleton "repair" command
>>
>>  "git worktree repair" command to correct on-disk pointers between
>>  the repository and its secondary working trees.
>
> I wonder if this could be reworded so it's clearer that "git worktree
> repair" is a new command, and to mention fixes to "git init
> --separate-git-dir". Perhaps like this?
>
>     "git worktree" gained a "repair" subcommand to help users recover
>     from problems arising from factors outside of Git's control.
>     Also, "git init --separate-git-dir" no longer corrupts
>     administrative data related to linked worktrees.

OK that reads much better.

-from problems arising from factors outside of Git's control.
+after moving the worktrees manually without telling Git.

The latter is slightly shorter; does the "repair" help situations
other than that, or is the above cover all the "factors outside" out
control?

Thanks.

^ permalink raw reply

* Re: [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config
From: Eric Sunshine @ 2020-09-02 16:38 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren,
	Jonathan Tan, Jonathan Nieder
In-Reply-To: <CAHd-oW5gk16=Tmhi_e43vVCufM9=zd8jtZJ_EFJVLmC4dJ06=w@mail.gmail.com>

On Wed, Sep 2, 2020 at 12:16 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
> With that said, I'm wondering now whether we should change the
> function's signature from:
>
> `check_config [expect_code <code>] <cmd> <key> <expected_value>`
>
> to:
>
> `check_config <cmd> <key> <expected_value>`
> `check_config expect_not_found <cmd> <key> <value>`
>
> The second form would then automatically expect exit code 1 and check
> stdout for the message 'Value not found for "<value>"'. With this we
> can avoid wrong uses of check_config to check an arbitrary error code
> without also checking stderr.

Yes, that seems more straightforward. In fact, at this point, you
could just have two distinct functions and eliminate the ugly
complexity of the existing check_config() implementation. Perhaps
something like this (typed in email):

    check_config () {
        test_tool config "$1" "$2" >actual &&
        shift && shift &&
        printf "%s\n" "$@" >expect &&
        test_cmp expect actual
    }

    check_not_found () {
        test_expect_code 1 test_tool config "$1" "$2" >actual &&
        echo "Value not found for \"$2\"" >expect &&
        test_cmp expect actual
    }

^ permalink raw reply

* Repo state broken due to mismatched file name casing during merge
From: Rafał Grzybowski @ 2020-09-02 16:26 UTC (permalink / raw)
  To: git

What did you do before the bug happened? (Steps to reproduce your issue)

mkdir repo
cd repo

git init

"sample file" > file.txt
git add file.txt
git commit -m "Added file.txt"


git checkout -b other_branch
echo other file > other_file.txt
git add other_file.txt
git commit -m "Added other_file.txt"

git checkout master
echo Other file > Other_file.txt
git add Other_file.txt
git commit -m "Added Other_file.txt"

git merge other_branch
git status

What did you expect to happen? (Expected behavior)

A clean state, no unstaged changes.

What happened instead? (Actual behavior)

There is always an unstaged file other_file.txt which case changes if
I try to discard and the unstaged change stays.
If I try to delete the file, I get two unstaged file removal changes.

What's different between what you expected and what actually happened?

The state should be clean. It looks like the merge process broke
something due to the casing.

Anything else you want to add:

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.28.0.windows.1
cpu: x86_64
built from commit: 77982caf269b7ee713a76da2bcf260c34d3bf7a7
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
uname: Windows 10.0 18363
compiler info: gnuc: 10.2
libc info: no libc information available
$SHELL (typically, interactive shell): <unset>

^ permalink raw reply

* Re: [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config
From: Matheus Tavares Bernardino @ 2020-09-02 16:16 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren,
	Jonathan Tan, Jonathan Nieder
In-Reply-To: <CAPig+cSLjMabeLgkg9N7+ZJ1jotbpJx0FAnjkpNSt0Lf+Q0wNQ@mail.gmail.com>

On Wed, Sep 2, 2020 at 3:57 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Wed, Sep 2, 2020 at 2:18 AM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> > One test in t1308 expects test-config to fail with exit code 128 due to
> > a parsing error in the config machinery. But test-config might also exit
> > with 128 for any other reason that leads it to call die(). Therefore the
> > test can potentially succeed for the wrong reason. To avoid false
> > positives, let's check test-config's output, in addition to the exit
> > code, and make sure that the cause of the error is the one we expect in
> > this test.
> >
> > Moreover, the test was using the auxiliary function check_config which
> > optionally takes a string to compare the test-config stdout against.
> > Because this string is optional, there is a risk that future callers may
> > also check only the exit code and not the output. To avoid that, make
> > the string parameter of this function mandatory.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
> > diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
> > @@ -14,10 +14,7 @@ check_config () {
> >                 expect_code=0
> >         fi &&
> >         op=$1 key=$2 && shift && shift &&
> > -       if test $# != 0
> > -       then
> > -               printf "%s\n" "$@"
> > -       fi >expect &&
> > +       printf "%s\n" "$@" >expect &&
>
> This change in behavior is quite subtle. With the original code,
> "expect" will be entirely empty if no argument is provided, whereas
> with the revised code, "expect" will contain a single newline. This
> could be improved by making the argument genuinely mandatory as stated
> in the commit message. Perhaps something like this:
>
>     if test $# -eq 0
>     then
>         BUG "check_config 'value' argument missing"
>     fi &&
>     printf "%s\n" "$@" >expect &&

Thanks for catching this. I will add the check.

> > @@ -130,7 +127,8 @@ test_expect_success 'check line error when NULL string is queried' '
> >  test_expect_success 'find integer if value is non parse-able' '
> > -       check_config expect_code 128 get_int lamb.head
> > +       test_expect_code 128 test-tool config get_int lamb.head 2>result &&
> > +       test_i18ngrep "fatal: bad numeric config value '\'none\'' for '\'lamb.head\''" result
> >  '
>exit
> The complex '\'quoting\'' magic leaves and re-enters the single-quote
> context of the test body and makes it difficult to reason about. Since
> this is a pattern argument to grep, a simpler alternative would be:
>
>     test_i18ngrep "fatal: bad numeric config value .none. for
> .lamb.head." result

Will do, thanks.

> Aside from that, do I understand correctly that all other callers
> which expect a non-zero exit code will find the error message on
> stdout, but this case will find it on stderr?

Right. This happens because, for a "value not found" error,
test-config will exit with code 1 and print to stdout. This is the
only case where it exits with a non-zero code and prints to stdout
instead of stderr.

With that said, I'm wondering now whether we should change the
function's signature from:

`check_config [expect_code <code>] <cmd> <key> <expected_value>`

to:

`check_config <cmd> <key> <expected_value>`
`check_config expect_not_found <cmd> <key> <value>`

The second form would then automatically expect exit code 1 and check
stdout for the message 'Value not found for "<value>"'. With this we
can avoid wrong uses of check_config to check an arbitrary error code
without also checking stderr.

> That makes one wonder
> if, rather than dropping use of check_config() here, instead
> check_config() should be enhanced to accept an additional option, such
> as 'stderr' which causes it to check stderr rather than stdout
> (similar to how 'expect_code' allows the caller to override the
> expected exit code). But perhaps that would be overengineered if this
> case is not expected to come up again as more callers are added in the
> future?

That's an interesting idea. However, because some callers may want to
use test_i18ngrep instead of test_cmp, I think the required logic
would become too complex.

^ permalink raw reply

* Re: [PATCH v2 03/11] merge-index: libify merge_one_path() and merge_all()
From: Alban Gruin @ 2020-09-02 15:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, phillip.wood
In-Reply-To: <xmqqft81cidt.fsf@gitster.c.googlers.com>

Le 01/09/2020 à 23:11, Junio C Hamano a écrit :
> Alban Gruin <alban.gruin@gmail.com> writes:
> 
>> The "resolve" and "octopus" merge strategies do not call directly `git
>> merge-one-file', they delegate the work to another git command, `git
>> merge-index', that will loop over files in the index and call the
>> specified command.  Unfortunately, these functions are not part of
>> libgit.a, which means that once rewritten, the strategies would still
>> have to invoke `merge-one-file' by spawning a new process first.
>>
>> To avoid this, this moves merge_one_path(), merge_all(), and their
>> helpers to merge-strategies.c.  They also take a callback to dictate
>> what they should do for each file.  For now, only one launching a new
>> process is defined to preserve the behaviour of the builtin version.
> 
> ... of the "builtin" version?  I thought this series is introducing
> a new builtin version?  Puzzled...
> 

`merge-index' is already a builtin, this step libifies it.  Its core
feature is to call repeatedly a command (usually it's
`git-merge-one-file'), but the new version will call a callback instead,
so its behaviour is not hardcoded.  This patch only provides a callback
starting a new command to preserve its behaviour.

Perhaps rewording the last sentence like this would be better?

  For now, only one launching a new process is defined, to preserve the
behaviour of `merge-index'.


^ permalink raw reply

* Re: [PATCH v2 02/11] merge-one-file: rewrite in C
From: Alban Gruin @ 2020-09-02 14:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, phillip.wood
In-Reply-To: <xmqqk0xdcim6.fsf@gitster.c.googlers.com>

Hi Junio,

Le 01/09/2020 à 23:06, Junio C Hamano a écrit :
> Alban Gruin <alban.gruin@gmail.com> writes:
> 
>> diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c
>> new file mode 100644
>> index 0000000000..306a86c2f0
>> --- /dev/null
>> +++ b/builtin/merge-one-file.c
>> @@ -0,0 +1,85 @@
>> +/*
>> + * Builtin "git merge-one-file"
>> + *
>> + * Copyright (c) 2020 Alban Gruin
>> + *
>> + * Based on git-merge-one-file.sh, written by Linus Torvalds.
>> + *
>> + * This is the git per-file merge utility, called with
>> + *
>> + *   argv[1] - original file SHA1 (or empty)
>> + *   argv[2] - file in branch1 SHA1 (or empty)
>> + *   argv[3] - file in branch2 SHA1 (or empty)
>> + *   argv[4] - pathname in repository
>> + *   argv[5] - original file mode (or empty)
>> + *   argv[6] - file in branch1 mode (or empty)
>> + *   argv[7] - file in branch2 mode (or empty)
>> + *
>> + * Handle some trivial cases. The _really_ trivial cases have been
>> + * handled already by git read-tree, but that one doesn't do any merges
>> + * that might change the tree layout.
>> + */
>> +
>> +#include "cache.h"
>> +#include "builtin.h"
>> +#include "lockfile.h"
>> +#include "merge-strategies.h"
>> +
>> +static const char builtin_merge_one_file_usage[] =
>> +	"git merge-one-file <orig blob> <our blob> <their blob> <path> "
>> +	"<orig mode> <our mode> <their mode>\n\n"
>> +	"Blob ids and modes should be empty for missing files.";
>> +
>> +int cmd_merge_one_file(int argc, const char **argv, const char *prefix)
>> +{
>> +	struct object_id orig_blob, our_blob, their_blob,
>> +		*p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL;
>> +	unsigned int orig_mode = 0, our_mode = 0, their_mode = 0, ret = 0;
>> +	struct lock_file lock = LOCK_INIT;
>> +
>> +	if (argc != 8)
>> +		usage(builtin_merge_one_file_usage);
>> +
>> +	if (repo_read_index(the_repository) < 0)
>> +		die("invalid index");
>> +
>> +	repo_hold_locked_index(the_repository, &lock, LOCK_DIE_ON_ERROR);
> 
> I do understand why we would want merge_strategies_one_file() helper
> introduced by this step so that the helper can work in an arbitrary
> repository (hence taking a pointer to repository structure as one of
> its parameters).
> 
> But the "merge-one-file" command will always work in the_repository.
> I do not see a point in using helpers that can work in an arbitrary
> repository, like repo_read_index() or repo_hold_locked_index(), in
> the above.  I only see downsides --- it is longer to read, makes
> readers wonder if there is something tricky involving another
> repository going on, etc.
> 

I was under the impression that using the_index is just deprecated, and
that we ought to avoid using it, even in builtins.

Will update that.

>> +	if (!get_oid(argv[1], &orig_blob)) {
>> +		p_orig_blob = &orig_blob;
>> +		orig_mode = strtol(argv[5], NULL, 8);
> 
> Write a wrapper around strtol(...,...,8) to reduce repetition, and
> make sure you do not pass NULL as the second parameter to strtol()
> to always check you parsed the string to the end.
> 
>> +	ret = merge_strategies_one_file(the_repository,
>> +					p_orig_blob, p_our_blob, p_their_blob, argv[4],
>> +					orig_mode, our_mode, their_mode);
> 
> Here, as I said above, it is perfectly fine to pass
> the_repository().
> 
>> +	if (ret) {
>> +		rollback_lock_file(&lock);
>> +		return ret;
>> +	}
>> +
>> +	return write_locked_index(the_repository->index, &lock, COMMIT_LOCK);
> 
> Likewise, I do not see much point in saying the_repository->index; the_index
> is a perfectly fine short-hand.
> 
> -%<-
>> +static int do_merge_one_file(struct index_state *istate,
>> +			     const struct object_id *orig_blob,
>> +			     const struct object_id *our_blob,
>> +			     const struct object_id *their_blob, const char *path,
>> +			     unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode)
>> +{
>> +	int ret, i, dest;
>> +	mmbuffer_t result = {NULL, 0};
>> +	mmfile_t mmfs[3];
>> +	struct ll_merge_options merge_opts = {0};
>> +	struct cache_entry *ce;
>> +
>> +	if (our_mode == S_IFLNK || their_mode == S_IFLNK)
>> +		return error(_("%s: Not merging symbolic link changes."), path);
>> +	else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK)
>> +		return error(_("%s: Not merging conflicting submodule changes."), path);
>> +
>> +	read_mmblob(mmfs + 1, our_blob);
>> +	read_mmblob(mmfs + 2, their_blob);
>> +
>> +	if (orig_blob) {
>> +		printf(_("Auto-merging %s\n"), path);
>> +		read_mmblob(mmfs + 0, orig_blob);
>> +	} else {
>> +		printf(_("Added %s in both, but differently.\n"), path);
>> +		read_mmblob(mmfs + 0, &null_oid);
>> +	}
>> +
>> +	merge_opts.xdl_opts = XDL_MERGE_ZEALOUS_ALNUM;
>> +	ret = ll_merge(&result, path,
>> +		       mmfs + 0, "orig",
>> +		       mmfs + 1, "our",
>> +		       mmfs + 2, "their",
>> +		       istate, &merge_opts);
>> +
>> +	for (i = 0; i < 3; i++)
>> +		free(mmfs[i].ptr);
>> +
>> +	if (ret > 127 || !orig_blob)
>> +		ret = error(_("content conflict in %s"), path);
> 
> The original only checked if ret is zero or non-zero; here we
> require ret to be large.  Intended?  
> 
> ll_merge() that called ll_xdl_merge() (i.e. the most common case)
> would return the value returned from xdl_merge(), which can be -1
> when we got an error before calling xdl_do_merge().  xdl_do_merge()
> in turn can return -1.  The most common case returns the value
> returned from xdl_cleanup_merge(), which is 0 for clean merge, and
> any positive integer (not clipped to 127 or 128) for conflicted one.
> 

Huh, not sure why I did this, and I'm puzzled that it did not broke
anything.

>> +	/* Create the working tree file, using "our tree" version from
>> +	   the index, and then store the result of the merge. */
> 
> Style. (cf. Documentation/CodingGuidelines).
> 
>> +	ce = index_file_exists(istate, path, strlen(path), 0);
>> +	if (!ce)
>> +		BUG("file is not present in the cache?");
>> +
>> +	unlink(path);
>> +	dest = open(path, O_WRONLY | O_CREAT, ce->ce_mode);
>> +	write_in_full(dest, result.ptr, result.size);
> 
> If open() fails, we write to a bogus file descriptor here.
> 
>> +	close(dest);
>> +
>> +	free(result.ptr);
>> +
>> +	if (ret && our_mode != their_mode)
>> +		return error(_("permission conflict: %o->%o,%o in %s"),
>> +			     orig_mode, our_mode, their_mode, path);
>> +	if (ret)
>> +		return 1;
> 
> What is the error returning convention around here?  Our usual
> convention is that 0 signals a success, and negative reports an
> error.  Returning the value returned from add_file_to_index() below,
> and error() above, are consistent with the convention, but this one
> returns 1 that is not.  When deviating from convention, it needs to
> be documented for the callers in a comment before the function
> definition.
> 

I stayed to close to the shell script on this one…

Note that this is not the case for "resolve" and "octopus", they use the
convention for merge backends, documented in builtin/merge.c:

> 		/*
> 		 * The backend exits with 1 when conflicts are
> 		 * left to be resolved, with 2 when it does not
> 		 * handle the given merge at all.
> 		 */

(In practice, it looks like any non-zero value lower than 2 indicates a
merge conflict, any value greater or equal to 2 is a general failure.)

>> +
>> +	return add_file_to_index(istate, path, 0);
>> +}
> 
> 
> 
>> +int merge_strategies_one_file(struct repository *r,
>> +			      const struct object_id *orig_blob,
>> +			      const struct object_id *our_blob,
>> +			      const struct object_id *their_blob, const char *path,
>> +			      unsigned int orig_mode, unsigned int our_mode,
>> +			      unsigned int their_mode)
>> +{
>> +	if (orig_blob &&
>> +	    ((!their_blob && our_blob && oideq(orig_blob, our_blob)) ||
>> +	     (!our_blob && their_blob && oideq(orig_blob, their_blob))))
>> +		/* Deleted in both or deleted in one and unchanged in
>> +		   the other */
>> +		return merge_one_file_deleted(r->index,
>> +					      orig_blob, our_blob, their_blob, path,
>> +					      orig_mode, our_mode, their_mode);
>> +	else if (!orig_blob && our_blob && !their_blob) {
>> +		/* Added in one.  The other side did not add and we
>> +		   added so there is nothing to be done, except making
>> +		   the path merged. */
>> +		return add_to_index_cacheinfo(r->index, our_mode, our_blob, path);
>> +	} else if (!orig_blob && !our_blob && their_blob) {
>> +		printf(_("Adding %s\n"), path);
>> +
>> +		if (file_exists(path))
>> +			return error(_("untracked %s is overwritten by the merge."), path);
>> +
>> +		if (add_to_index_cacheinfo(r->index, their_mode, their_blob, path))
>> +			return 1;
>> +		return checkout_from_index(r->index, path);
>> +	} else if (!orig_blob && our_blob && their_blob &&
>> +		   oideq(our_blob, their_blob)) {
>> +		/* Added in both, identically (check for same
>> +		   permissions). */
>> +		if (our_mode != their_mode)
>> +			return error(_("File %s added identically in both branches, "
>> +				       "but permissions conflict %o->%o."),
>> +				     path, our_mode, their_mode);
>> +
>> +		printf(_("Adding %s\n"), path);
>> +
>> +		if (add_to_index_cacheinfo(r->index, our_mode, our_blob, path))
>> +			return 1;
>> +		return checkout_from_index(r->index, path);
>> +	} else if (our_blob && their_blob)
>> +		/* Modified in both, but differently. */
>> +		return do_merge_one_file(r->index,
>> +					 orig_blob, our_blob, their_blob, path,
>> +					 orig_mode, our_mode, their_mode);
>> +	else {
>> +		char *orig_hex = "", *our_hex = "", *their_hex = "";
>> +
>> +		if (orig_blob)
>> +			orig_hex = oid_to_hex(orig_blob);
>> +		if (our_blob)
>> +			our_hex = oid_to_hex(our_blob);
>> +		if (their_blob)
>> +			their_hex = oid_to_hex(their_blob);
> 
> Prepare three char [] buffers and use oid_to_hex_r() instead,
> instead of relying that we'd have sufficient number of entries in
> the rotating buffer.
> 
>> +		return error(_("%s: Not handling case %s -> %s -> %s"),
>> +			     path, orig_hex, our_hex, their_hex);
>> +	}
>> +
>> +	return 0;
>> +}
>> diff --git a/merge-strategies.h b/merge-strategies.h
>> new file mode 100644
>> index 0000000000..b527d145c7
>> --- /dev/null
>> +++ b/merge-strategies.h
>> @@ -0,0 +1,13 @@
>> +#ifndef MERGE_STRATEGIES_H
>> +#define MERGE_STRATEGIES_H
>> +
>> +#include "object.h"
>> +
>> +int merge_strategies_one_file(struct repository *r,
>> +			      const struct object_id *orig_blob,
>> +			      const struct object_id *our_blob,
>> +			      const struct object_id *their_blob, const char *path,
>> +			      unsigned int orig_mode, unsigned int our_mode,
>> +			      unsigned int their_mode);
>> +
>> +#endif /* MERGE_STRATEGIES_H */
>> diff --git a/t/t6415-merge-dir-to-symlink.sh b/t/t6415-merge-dir-to-symlink.sh
>> index 2eddcc7664..5fb74e39a0 100755
>> --- a/t/t6415-merge-dir-to-symlink.sh
>> +++ b/t/t6415-merge-dir-to-symlink.sh
>> @@ -94,7 +94,7 @@ test_expect_success SYMLINKS 'a/b was resolved as symlink' '
>>  	test -h a/b
>>  '
>>  
>> -test_expect_failure 'do not lose untracked in merge (resolve)' '
>> +test_expect_success 'do not lose untracked in merge (resolve)' '
>>  	git reset --hard &&
>>  	git checkout baseline^0 &&
>>  	>a/b/c/e &&


^ permalink raw reply

* [PATCH] fetch: do not look for submodule changes in unchanged refs
From: Orgad Shaneh via GitGitGadget @ 2020-09-02 14:23 UTC (permalink / raw)
  To: git; +Cc: Orgad Shaneh, Orgad Shaneh

From: Orgad Shaneh <orgads@gmail.com>

This operation is very expensive, as it scans all the refs using
setup_revisions, which resolves each ref, including checking if it
is ambiguous, or if it is a file name etc.

There is no reason to do all that for refs that haven't changed in this
fetch.

Reported here:
https://public-inbox.org/git/CAGHpTBKSUJzFSWc=uznSu2zB33qCSmKXM-iAjxRCpqNK5bnhRg@mail.gmail.com/

Amends commit be76c2128234d94b47f7087152ee55d08bb65d88.

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
---
    fetch: do not look for submodule changes in unchanged refs
    
    This operation is very expensive, as it scans all the refs using
    setup_revisions, which resolves each ref, including checking if it is
    ambiguous, or if it is a file name etc.
    
    There is no reason to do all that for refs that hasn't changed in this
    fetch.
    
    Reported here:
    https://public-inbox.org/git/CAGHpTBKSUJzFSWc=uznSu2zB33qCSmKXM-iAjxRCpqNK5bnhRg@mail.gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-720%2Forgads%2Ffetch-less-submodules-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-720/orgads/fetch-less-submodules-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/720

 builtin/fetch.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0f23dd4b8c..d3f922fc89 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -958,8 +958,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				ref->force = rm->peer_ref->force;
 			}
 
-			if (recurse_submodules != RECURSE_SUBMODULES_OFF)
+			if (recurse_submodules != RECURSE_SUBMODULES_OFF &&
+			    (!rm->peer_ref || !oideq(&ref->old_oid, &ref->new_oid))) {
 				check_for_new_submodule_commits(&rm->old_oid);
+			}
 
 			if (!strcmp(rm->name, "HEAD")) {
 				kind = "";

base-commit: e19713638985533ce461db072b49112da5bd2042
-- 
gitgitgadget

^ permalink raw reply related

* Re: [GSoC][PATCH] submodule: port submodule subcommand 'add' from shell to C
From: Shourya Shukla @ 2020-09-02 12:04 UTC (permalink / raw)
  To: Kaartic Sivaraam
  Cc: gitster, git, christian.couder, johannes.schindelin, liu.denton
In-Reply-To: <31e40c63bbac03d261ac6f46a0d2f6ae90a21038.camel@gmail.com>

On 02/09 02:05, Kaartic Sivaraam wrote:
> On Mon, 2020-08-31 at 18:34 +0530, Shourya Shukla wrote:
> > On 31/08 01:28, Kaartic Sivaraam wrote:
> > 
> > This is what I have done finally:
> > ---
> > 	if (read_cache() < 0)
> > 		die(_("index file corrupt"));
> > 
> > 	if (!force) {
> > 		if (cache_file_exists(path, strlen(path), ignore_case) ||
> > 		    cache_dir_exists(path, strlen(path)))
> > 			die(_("'%s' already exists in the index"), path);
> > 	} else {
> > 		int cache_pos = cache_name_pos(path, strlen(path));
> > 		struct cache_entry *ce = the_index.cache[cache_pos];
> > 		if (cache_pos >= 0 && !S_ISGITLINK(ce->ce_mode))
> > 			die(_("'%s' already exists in the index and is not a "
> > 			      "submodule"), path);
> > 	}
> > ---
> > 
> > I did not put the 'cache_pos >= 0' at the start since I thought that it
> > will unnecessarily increase an indentation level. Since we are using
> > 'cache_{file,dir}_exists' in the first check and 'cache_name_pos()' in
> > the second, the placement of check at another indentation level would be
> > unnecessary. What do you think about this?
> > 
> 
> Interestingly. 'cache_dir_exists' seems to work as expected only when
> the global ignore_case whose value seems to depend on core.ignorecase.
> So, we can't just rely on 'cache_dir_exists to identify a directory
> that has tracked contents. Apparently, the 'directory_exists_in_index'
> in 'dir.c' seems to have the code that we want here (which is also the
> only user of 'index_dir_exists'; the function for which
> 'cache_dir_exists' is a convenience wrapper.

I think both 'cache_{dir,file}_exists()' depend on 'core.ignorecase'
though I am not able to confirm this for 'cache_dir_exists()'. Where
exactly does this happen for the function? The function you mention
seems perfect to me, though, we will also have to make the enum
'exist_status' visible. Will that be fine? The final output will be:
---
	if (!force) {
		if (directory_exists_in_index(&the_index, path, strlen(path)))
			die(_("'%s' already exists in the index"), path);
	} else {
		int cache_pos = cache_name_pos(path, strlen(path));
		struct cache_entry *ce = the_index.cache[cache_pos];
		if (cache_pos >= 0 && !S_ISGITLINK(ce->ce_mode))
			die(_("'%s' already exists in the index and is not a "
			      "submodule"), path);
	}
---


And obviously an extra commit changing the visibility of the function
and the enum.
 
> > > This is more close to what the shell version did but misses one case
> > > which might or might not be covered by the test suite[1]. The case when
> > > path is a directory that has tracked contents. In the shell version we
> > > would get:
> > > 
> > >    $ git submodule add ../git-crypt/ builtin
> > >    'builtin' already exists in the index
> > >    $ git submodule add --force ../git-crypt/ builtin
> > >    'builtin' already exists in the index and is not a submodule
> > > 
> > >    In the C version with the above snippet we get:
> > > 
> > >    $ git submodule add --force ../git-crypt/ builtin
> > >    fatal: 'builtin' does not have a commit checked out
> > >    $ git submodule add ../git-crypt/ builtin
> > >    fatal: 'builtin' does not have a commit checked out
> > > 
> > >    That's not appropriate and should be fixed. I believe we could do
> > >    something with `cache_dir_exists` to fix this.
> > > 
> > > 
> > >    Footnote
> > >    ===
> > > 
> > >    [1]: If it's not covered already, it might be a good idea to add a test
> > >    for the above case.
> > 
> > Like Junio said, we do not care if it is a file or a directory of any
> > sorts, we will give the error if it already exists. Therefore, even if
> > it is an untracked or a tracked one, it should not matter to us. Hence
> > testing for it may not be necessary is what I feel. Why should we test
> > it?
> 
> I'm guessing you misunderstood. A few things:
> 
> - We only care about tracked contents for the case in hand.
> 
> - Identifying whether a given path corresponds to a directory
>   which has tracked contents is tricky. Neither 'cache_name_pos'
>   nor 'cache_file_exists' handle this. 'cache_dir_exists' is also
>   not very useful as mentioned above.
> 
> So, we do have to take care when handling that case as Junio pointed
> out.

I still do not understand this case. Let's say this was our
superproject:

.gitmodules .git/ a.txt dir1/

And we did:
    $ git submodule add <url> dir1/

Now, at this point, how does it matter if 'dir1/' has tracked content or
not right? A directory exists with that name and now we do not add the
SM to that path.


^ permalink raw reply

* Re: [PATCH v2 3/3] ci: stop linking built-ins to the dashed versions
From: Johannes Schindelin @ 2020-09-02  7:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, Johannes Schindelin via GitGitGadget, git
In-Reply-To: <xmqq5z95xu5f.fsf@gitster.c.googlers.com>

Hi Junio,

On Wed, 26 Aug 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Actually, `SKIP_DASHED_BUILT_INS` does not _only_ have an impact on `make
> > install`:
> > ...
> > See how `git-add.exe` is linked in the first, but not in the second run?
>
> OK, that is one more reason why we do want to have 3/3 applied not
> for all tasks in the CI , but for subset of tasks that includes the
> Windows task.  If we had multiple Windows tasks, it may even be
> better to have only to some tasks, and allow other tasks build
> git-add.exe, so that both can be tested for the primary intended
> platform.

If you want to skip this patch, that's fine with me.

But I would like to clarify what I perceive as a misunderstanding: this
patch is not about testing whether it would install the necessary files
or not.

What this patch does is simply to complete the mission of e4597aae6590
(run test suite without dashed git-commands in PATH, 2009-12-02): to make
sure that our very own scripts do not use dashed invocations of built-in
commands.

In that respect, I find it to make more sense to either do it, or not do
it (even if I don't quite understand why we wouldn't do it), instead of
doing it only for one platform.

Ciao,
Dscho

^ permalink raw reply

* Re: Git in Outreachy?
From: Johannes Schindelin @ 2020-09-02  4:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder
In-Reply-To: <20200828065609.GA2105118@coredump.intra.peff.net>

Hi Peff,

On Fri, 28 Aug 2020, Jeff King wrote:

> Are we interested in participating in the December 2020 round of
> Outreachy? The community application period is now open.
>
> I can look into lining up funding, but we'd also need:
>
>   - volunteers to act as mentors
>
>   - updates to our applicant materials (proposed projects, but also
>     microproject / patch suggestions)

Thank you for thinking of me! While I am eager to help, I think I won't be
able to commit to a full term, though.

As to projects, I would like to believe that
https://github.com/gitgitgadget/git/issues has grown into a useful
resource.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] Makefile: add support for generating JSON compilation database
From: Jeff King @ 2020-09-02  8:04 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Junio C Hamano, Philippe Blain via GitGitGadget, git,
	Philippe Blain
In-Reply-To: <20200902013351.GD241078@camp.crustytoothpaste.net>

On Wed, Sep 02, 2020 at 01:33:51AM +0000, brian m. carlson wrote:

> Traditionally, editors had to learn about every language if they wanted
> to add special functionality like refactoring (e.g., renaming "struct
> foo" to "struct bar"), finding all the instances of a type, finding
> where a type or function was declared, or similar IDE features.  When
> Microsoft developed Visual Studio Code, they decided that they did not
> want to implement this functionality for every language under the sun,
> and instead developed the Language Server Protocol[0].
> [...]

Thanks for the explanation. I understand what LSP does, but the missing
link for me was how "here are the command-line flags to the compiler"
turned into something useful like "here's a list of identifiers". And
clangd fills in that gap (presumably re-running the front-end bits of
clang on the fly to pull out that kind of information).

-Peff

^ permalink raw reply

* Re: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()
From: Jeff King @ 2020-09-02  7:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git
In-Reply-To: <xmqq3641ebep.fsf@gitster.c.googlers.com>

On Tue, Sep 01, 2020 at 08:58:54AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > If we do handle it up-front, then I think we'd actually want:
> >
> >   if (!size) {
> >           free(ptr);
> > 	  return xmalloc(0);
> >   }
> >
> > (i.e., to never return NULL for consistency with xmalloc() and
> > xcalloc()).
> 
> Makes sense.  I suspect that this is optimizing for a wrong case,
> but in practice that should not matter.  Not having to worry about
> a request to resize to 0-byte in the remainder of the function is
> actually a plus for readability, I would say.

OK. Then here it is like that (and a slightly updated final paragraph in
the commit message).

There are other variants, too:

  - we could use malloc(1) versus xmalloc(0). Maybe more
    readable/obvious? But also potentially allocates an extra byte when
    the platform malloc(0) would not need to.

  - we could return a non-NULL "ptr" without shrinking it at all (nor
    allocating anything new). This is perfectly legal, and the
    underlying realloc() would still know the original size if anybody
    ever asked to grow it back again.

I have to admit I don't overly care between them. I suspect one of the
reasons we never ran into this 15-year-old bug is that it's quite hard
to convince Git to call realloc(0) in the first place. I only saw it
when investigating a bug in another series, and there the problem turned
out to be reading garbage bytes off the end of a buffer (which we
interpreted as a serialized ewah bitmap which happened to have a zero in
its length byte).

-- >8 --
Subject: [PATCH] xrealloc: do not reuse pointer freed by zero-length realloc()

This patch fixes a bug where xrealloc(ptr, 0) can double-free and
corrupt the heap on some platforms (including at least glibc).

The C99 standard says of malloc (section 7.20.3):

  If the size of the space requested is zero, the behavior is
  implementation-defined: either a null pointer is returned, or the
  behavior is as if the size were some nonzero value, except that the
  returned pointer shall not be used to access an object.

So we might get NULL back, or we might get an actual pointer (but we're
not allowed to look at its contents). To simplify our code, our
xmalloc() handles a NULL return by converting it into a single-byte
allocation. That way callers get consistent behavior. This was done way
back in 4e7a2eccc2 (?alloc: do not return NULL when asked for zero
bytes, 2005-12-29).

We also gave xcalloc() and xrealloc() the same treatment. And according
to C99, that is fine; the text above is in a paragraph that applies to
all three. But what happens to the memory we passed to realloc() in such
a case? I.e., if we do:

  ret = realloc(ptr, 0);

and "ptr" is non-NULL, but we get NULL back, is "ptr" still valid? C99
doesn't cover this case specifically, but says (section 7.20.3.4):

  The realloc function deallocates the old object pointed to by ptr and
  returns a pointer to a new object that has the size specified by size.

So "ptr" is now deallocated, and we must only look at "ret". And since
"ret" is NULL, that means we have no allocated object at all. But that's
not quite the whole story. It also says:

  If memory for the new object cannot be allocated, the old object is
  not deallocated and its value is unchanged.
  [...]
  The realloc function returns a pointer to the new object (which may
  have the same value as a pointer to the old object), or a null pointer
  if the new object could not be allocated.

So if we see a NULL return with a non-zero size, we can expect that the
original object _is_ still valid. But with a non-zero size, it's
ambiguous. The NULL return might mean a failure (in which case the
object is valid), or it might mean that we successfully allocated
nothing, and used NULL to represent that.

The glibc manpage for realloc() explicitly says:

  [...]if size is equal to zero, and ptr is not NULL, then the call is
  equivalent to free(ptr).

Likewise, this StackOverflow answer:

  https://stackoverflow.com/a/2135302

claims that C89 gave similar guidance (but I don't have a copy to verify
it). A comment on this answer:

  https://stackoverflow.com/a/2022410

claims that Microsoft's CRT behaves the same.

But our current "retry with 1 byte" code passes the original pointer
again. So on glibc, we effectively free() the pointer and then try to
realloc() it again, which is undefined behavior.

The simplest fix here is to just pass "ret" (which we know to be NULL)
to the follow-up realloc(). But that means that a system which _doesn't_
free the original pointer would leak it. It's not clear if any such
systems exist, and that interpretation of the standard seems unlikely
(I'd expect a system that doesn't deallocate to simply return the
original pointer in this case). But it's easy enough to err on the safe
side, and just never pass a zero size to realloc() at all.

Signed-off-by: Jeff King <peff@peff.net>
---
 wrapper.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index 4ff4a9c3db..bcda41e374 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -117,10 +117,13 @@ void *xrealloc(void *ptr, size_t size)
 {
 	void *ret;
 
+	if (!size) {
+		free(ptr);
+		return xmalloc(0);
+	}
+
 	memory_limit_check(size, 0);
 	ret = realloc(ptr, size);
-	if (!ret && !size)
-		ret = realloc(ptr, 1);
 	if (!ret)
 		die("Out of memory, realloc failed");
 	return ret;
-- 
2.28.0.844.g468840f815


^ permalink raw reply related

* Re: [PATCH v5 5/8] t/helper/test-config: unify exit labels
From: Eric Sunshine @ 2020-09-02  7:30 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren,
	Jonathan Tan, Jonathan Nieder
In-Reply-To: <56535b0e36e94dc73aa570f4f3c0466305c6432f.1599026986.git.matheus.bernardino@usp.br>

On Wed, Sep 2, 2020 at 2:18 AM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
> test-config's main function has three different exit labels, all of
> which have to perform the same cleanup code before returning. Unify the
> labels in preparation for the next patch which will increase the cleanup
> section.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> diff --git a/t/helper/test-config.c b/t/helper/test-config.c
> @@ -69,16 +69,19 @@ static int early_config_cb(const char *var, const char *value, void *vdata)
> +#define TC_VALUE_NOT_FOUND 1
> +#define TC_CONFIG_FILE_ERROR 2
> +
>  int cmd__config(int argc, const char **argv)
>  {
> +       int i, val, ret = 0;
>
>         if (argc == 3 && !strcmp(argv[1], "read_early_config")) {
>                 read_early_config(early_config_cb, (void *)argv[2]);
> -               return 0;
> +               return ret;
>         }

This one change feels both fragile and increases cognitive load, thus
does not seem desirable. It feels fragile because someone could come
along and insert code above this conditional which assigns some value
other than 0 to 'ret' (not realizing that this conditional wants to
return 0), thus breaking it. It increases cognitive load because it
forces the reader to scan all the code leading up to this point to
determine what value this conditional really wants to return.

Nevertheless, this is a minor objection, not necessarily worth a re-roll.

> @@ -94,10 +97,9 @@ int cmd__config(int argc, const char **argv)
>                         printf("Value not found for \"%s\"\n", argv[2]);
> -                       goto exit1;
> +                       ret = TC_VALUE_NOT_FOUND;

This sort of change, on the other hand, does not increase cognitive
load because it's quite obvious what return value this conditional
wants to return (because it's assigning it explicitly).

^ permalink raw reply

* Re: [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv
From: Eric Sunshine @ 2020-09-02  7:18 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren,
	Jonathan Tan, Jonathan Nieder
In-Reply-To: <0750191342754bcca398c6fdad522616b0f3fbc3.1599026986.git.matheus.bernardino@usp.br>

On Wed, Sep 2, 2020 at 2:18 AM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
> Check that we have the expected argc in 'configset_get_value' and
> 'configset_get_value_multi' before trying to access argv elements.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> diff --git a/t/helper/test-config.c b/t/helper/test-config.c
> @@ -138,7 +138,7 @@ int cmd__config(int argc, const char **argv)
> -       } else if (!strcmp(argv[1], "configset_get_value")) {
> +       } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) {
>                 for (i = 3; i < argc; i++) {
> @@ -156,7 +156,7 @@ int cmd__config(int argc, const char **argv)
>                         printf("Value not found for \"%s\"\n", argv[2]);

This is certainly a bug fix since it was accessing argv[2] without
checking that that element was even present, but the more significant
outcome of this change is that it now correctly diagnoses when these
two commands are called with the wrong number of arguments (just like
all the other commands -- except "iterate" -- diagnose incorrect
number of arguments). It might make sense, therefore, for the commit
message to focus on that improvement and mention the out-of-bounds
array access fix as a side-effect. However, that itself is not worth a
re-roll.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox