Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2 2/7] t0410: convert tests to use DEFAULT_REPO_FORMAT prereq
From: Junio C Hamano @ 2024-02-15 18:19 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <2dd87f3126cedd39cb5b113053c90ee35ae0e5ff.1707985173.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> We have recently introduced a new DEFAULT_REPO_FORMAT prerequisite.
> Despite capturing the intent more directly, it also has the added
> benefit that it can easily be extended in the future in case we add new
> repository extensions. Adapt the tests to use it.

That is a good explanation.  Instead of lifting the prerequisite and
forcing version 0 altogether, we can move by one step (I am puzzled
by "despite" though).

As I said in a separate message, as long as we make sure somebody
will test with version 0, we do not mind if other people test with
higher versions, so in that sense, the patch in v1 may be closer to
what we want to have in the longer term (but we need to figure out
how we "make sure somebody will test with version 0" first).

Thanks, will queue.

> -test_expect_success SHA1,REFFILES 'convert to partial clone with noop extension' '
> +test_expect_success DEFAULT_REPO_FORMAT 'convert to partial clone with noop extension' '
>  	rm -fr server client &&
>  	test_create_repo server &&
>  	test_commit -C server my_commit 1 &&
> @@ -60,7 +60,7 @@ test_expect_success SHA1,REFFILES 'convert to partial clone with noop extension'
>  	git -C client fetch --unshallow --filter="blob:none"
>  '
>  
> -test_expect_success SHA1,REFFILES 'converting to partial clone fails with unrecognized extension' '
> +test_expect_success DEFAULT_REPO_FORMAT 'converting to partial clone fails with unrecognized extension' '
>  	rm -fr server client &&
>  	test_create_repo server &&
>  	test_commit -C server my_commit 1 &&

^ permalink raw reply

* Re: [PATCH] use C99 declaration of variable in for() loop
From: Christian Couder @ 2024-02-15 18:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elia Pinto, git
In-Reply-To: <xmqqcysxskd9.fsf@gitster.g>

On Thu, Feb 15, 2024 at 6:33 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elia Pinto <gitter.spiros@gmail.com> writes:
>
> > With the exception of cbtree.c, which would need initial
> > reworking to remove the usage of goto, it expands the
> > use of variable scope reduction in for loops as
> > permitted by the C99 standard, which was first introduced
> > in the git codebase with commit 44ba10d6.
>
> Thanks, but ...
>
> Our test balloon may have proven that nobody will be inconvenienced,
> and it does mean we can be liberal using it when we add new code or
> update existing loops "while at it", but I personally do not think
> such a code churn is very welcome.

Perhaps such changes could be accepted when they are made in only one
file as part of a microproject though?

^ permalink raw reply

* Re: [PATCH 2/2] sequencer: unset GIT_CHERRY_PICK_HELP for 'exec' commands
From: Junio C Hamano @ 2024-02-15 17:36 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Phillip Wood, Kristoffer Haugsbakk, git, Jonathan Nieder
In-Reply-To: <76c62133-30e4-4145-99ea-aeef644d617a@oracle.com>

Vegard Nossum <vegard.nossum@oracle.com> writes:

> On 11/02/2024 18:05, Junio C Hamano wrote:
>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>> On 08/02/2024 17:20, Junio C Hamano wrote:
>>>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>> It might be worth explaining why this happens
>>>
>>> This is because rebase sets the GIT_CHERRY_PICK_HELP environment
>>> variable to customize the advice given to users when there are
>>> conflicts which causes the sequencer to remove CHERRY_PICK_HEAD.
>> True.  I'd prefer to see the original submitter assemble the pieces
>> and come up with the final version, rather than me doing so.
>
> Thanks for explaining and sorry for the delay. I saw the patch was
> merged to main now, but I will keep this in mind for next time.

Thanks for finding and fixing.  Hope we'll see more of your
contributions in the future.


^ permalink raw reply

* Re: [PATCH] use C99 declaration of variable in for() loop
From: Junio C Hamano @ 2024-02-15 17:33 UTC (permalink / raw)
  To: Elia Pinto; +Cc: git
In-Reply-To: <20240215094243.147057-1-gitter.spiros@gmail.com>

Elia Pinto <gitter.spiros@gmail.com> writes:

> With the exception of cbtree.c, which would need initial
> reworking to remove the usage of goto, it expands the
> use of variable scope reduction in for loops as
> permitted by the C99 standard, which was first introduced
> in the git codebase with commit 44ba10d6.

Thanks, but ...

Our test balloon may have proven that nobody will be inconvenienced,
and it does mean we can be liberal using it when we add new code or
update existing loops "while at it", but I personally do not think
such a code churn is very welcome.

^ permalink raw reply

* Re: [PATCH] t/lib-credential: clean additional credential
From: Junio C Hamano @ 2024-02-15 17:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Bo Anderson via GitGitGadget, git, Bo Anderson
In-Reply-To: <20240215043900.GA2821179@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Feb 15, 2024 at 01:03:56AM +0000, Bo Anderson via GitGitGadget wrote:
>
>> From: Bo Anderson <mail@boanderson.me>
>> 
>> 71201ab0e5 (t/lib-credential.sh: ensure credential helpers handle long
>> headers, 2023-05-01) added a test which stores credentials with the host
>> victim.example.com but this was never cleaned up, leaving residual data
>> in the credential store after running the tests.
>> 
>> Add a cleanup call for this credential to resolve this issue.
>
> Good catch. The patch looks obviously correct.
>
> I'm not surprised nobody noticed until now, as I expect it is pretty
> rare for people to run t0303 against system helpers (it is not a problem
> for t0301, etc, because they only touch the internal trash directory).
>
> I wonder if we might want something like this, as well, which can catch
> leftovers:

Sounds like a good hygiene ;-).

>
> diff --git a/t/t0302-credential-store.sh b/t/t0302-credential-store.sh
> index 716bf1af9f..4183154243 100755
> --- a/t/t0302-credential-store.sh
> +++ b/t/t0302-credential-store.sh
> @@ -6,6 +6,11 @@ test_description='credential-store tests'
>  
>  helper_test store
>  
> +helper_test_clean store
> +test_expect_success 'test cleanup removes everything' '
> +	test_must_be_empty "$HOME/.git-credentials"
> +'
> +
>  test_expect_success 'when xdg file does not exist, xdg file not created' '
>  	test_path_is_missing "$HOME/.config/git/credentials" &&
>  	test -s "$HOME/.git-credentials"
>
> -Peff

^ permalink raw reply

* Re: [PATCH 2/7] t0410: enable tests with extensions with non-default repo format
From: Junio C Hamano @ 2024-02-15 17:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jonathan Nieder, git
In-Reply-To: <Zc3EbCXFdG8E7_v2@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Feb 14, 2024 at 02:57:55PM -0800, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> > In t0410 we have two tests which exercise how partial clones behave in
>> > the context of a repository with extensions. These tests are marked to
>> > require a default repository using SHA1 and the "files" backend because
>> > we explicitly set the repository format version to 0.
>> >
>> > Changing the repository format version to 0 is not needed though. The
>> > "noop" extension is ignored as expected regardless of what the version
>> > is set to, same as the "nonsense" extension leads to failure regardless
>> > of the version.
>> 
>> Isn't the reason why 11664196 kept the forcing of the format version
>> because it wanted to see noop ignored and nonsense failed even if
>> the format version is 0 to ensure the regression it fixed will stay
>> fixed?  IOW, we force version 0 not because we do not want to test
>> with anything but SHA1 and REFFILES; we pretty much assume that with
>> the default version, noop and nonsense will be handled sensibly, and
>> we want to make sure they will be with version 0 as well.
>> 
>> And once we force to version 0, we have trouble running with
>> anything other than SHA1 and REFFILES, hence these prerequisites.
>> 
>> So, I dunno.
>
> Hum, I guess that's fair. Let me adapt the test case to instead use the
> DEFAULT_REPO_FORMAT prerequisite then.

If we expect that "git init" by default will create version 0
repository in the foreseeable future (and when the default gets
fliped to create version 1, we will add CI task that forces version
0 somehow, just like we have special CI task to run test with ref
backend set to reftable), then I think the patch as posted may
actually be a good idea.  As long as somebody is checking these with
a repository in version 0 format, the original motivation behind the
tests is satisified, and if we in addition make sure noop and
nonsense are handled sensibly with higher versions, that would also
be good, too.

So perhaps we need some in-code comment next to these tests to
remind us about that, while removing these prerequisites?

Thanks.

^ permalink raw reply

* Re: [PATCH] git: --no-lazy-fetch option
From: Junio C Hamano @ 2024-02-15 17:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder
In-Reply-To: <20240215053056.GD2821179@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Feb 08, 2024 at 03:17:31PM -0800, Junio C Hamano wrote:
>
>> Sometimes, especially during tests of low level machinery, it is
>> handy to have a way to disable lazy fetching of objects.  This
>> allows us to say, for example, "git cat-file -e <object-name>", to
>> see if the object is locally available.
>
> That seems like a good feature, but...
>
>> @@ -186,6 +187,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>>  			use_pager = 0;
>>  			if (envchanged)
>>  				*envchanged = 1;
>> +		} else if (!strcmp(cmd, "--no-lazy-fetch")) {
>> +			fetch_if_missing = 0;
>>  		} else if (!strcmp(cmd, "--no-replace-objects")) {
>>  			disable_replace_refs();
>>  			setenv(NO_REPLACE_OBJECTS_ENVIRONMENT, "1", 1);
>
> This will only help builtin commands, and even then only the top-level
> one. If I run "git --no-lazy-fetch foo" and "foo" is a script or an
> alias, I'd expect it to still take effect. Ditto for sub-commands kicked
> off by a builtin (say, a "rev-list" connectivity check caused by a
> fetch).
>
> So this probably needs to be modeled after --no-replace-objects, etc,
> where we set an environment variable that makes it to child processes.

Yuck, I was hoping that we can get away with the tiny change only
for builtins,but you're right.

^ permalink raw reply

* Re: [PATCH 1/4] notes: print note blob to stdout directly
From: Jeff King @ 2024-02-15 15:04 UTC (permalink / raw)
  To: Maarten Bosmans; +Cc: Junio C Hamano, git, Teng Long
In-Reply-To: <CA+CvcKR9sH=qZB4oZvX9RWd+4H3Bq8WV_qUOiSj_Tsf=Dr_Xvw@mail.gmail.com>

On Thu, Feb 15, 2024 at 08:46:02AM +0100, Maarten Bosmans wrote:

> > How about:
> >
> >   cat some_commit_ids |
> >   git show --stdin -s -z --format='%H%n%N'
> >
> Wouldn't that fail horribly with non-text blobs?

Yes, if you have NULs embedded in your notes then it won't work. Any
batch output format would require byte counts, then. If we wanted to add
a feature to support that, I would suggest one of:

  - teach the pretty-print formatter a new placeholder to output the
    number of bytes in an element. Then you could do something like
    "%H %(size:%N)%n%N", but it would be generally useful for other
    cases, too.

  - teach the pretty-print formatter a variant of %N that outputs only
    the oid of the note, note the note content itself. And then you
    could do something like:

      git log --format='%(note:oid) %H' |
      git cat-file --batch='%(objectname) %(objectsize) %(rest)'

    to get the usual cat-file output of each note blob, but associated
    with the commit it's attached to (the "%(rest)" placeholder for
    cat-file just relays any text found after the object name of each
    line). You might need to do some scripting between the two to handle
    commits with no note.

Of the two, I'd guess that the second one is a lot less work to
implement (on the Git side; on the reading side it's a little more
involved, but still should be a constant number of processes).

One variant of the second one is to use "git notes list". For example,
you can get all notes via cat-file like this right now:

  git notes list |
  git cat-file --batch='%(objectname) %(objectsize) %(rest)'

You can get individual notes by asking for "git notes list <commit>",
but it will only take one at a time. So another easy patch would be
something like (indentation left funny to make the diff more readable):

diff --git a/builtin/notes.c b/builtin/notes.c
index e65cae0bcf..5fdad5fb8f 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -446,22 +446,22 @@ static int list(int argc, const char **argv, const char *prefix)
 		argc = parse_options(argc, argv, prefix, options,
 				     git_notes_list_usage, 0);
 
-	if (1 < argc) {
-		error(_("too many arguments"));
-		usage_with_options(git_notes_list_usage, options);
-	}
-
 	t = init_notes_check("list", 0);
 	if (argc) {
-		if (repo_get_oid(the_repository, argv[0], &object))
-			die(_("failed to resolve '%s' as a valid ref."), argv[0]);
+		retval = 0;
+		while (*++argv) {
+		if (repo_get_oid(the_repository, *argv, &object))
+			die(_("failed to resolve '%s' as a valid ref."), *argv);
 		note = get_note(t, &object);
 		if (note) {
-			puts(oid_to_hex(note));
-			retval = 0;
+			if (argc > 1)
+				printf("%s %s\n", oid_to_hex(note), oid_to_hex(&object));
+			else
+				puts(oid_to_hex(note));
 		} else
-			retval = error(_("no note found for object %s."),
+			retval |= error(_("no note found for object %s."),
 				       oid_to_hex(&object));
+		}
 	} else
 		retval = for_each_note(t, 0, list_each_note, NULL);
 

That would allow:

  git rev-list ... |
  xargs git notes list |
  git cat-file --batch='%(objectname) %(objectsize) %(rest)'

We could even add a "--stdin" mode to avoid the use of xargs.

-Peff

^ permalink raw reply related

* Re: [PATCH 2/2] sequencer: unset GIT_CHERRY_PICK_HELP for 'exec' commands
From: Vegard Nossum @ 2024-02-15 14:24 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood; +Cc: Kristoffer Haugsbakk, git, Jonathan Nieder
In-Reply-To: <xmqq8r3r9biz.fsf@gitster.g>


On 11/02/2024 18:05, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>> On 08/02/2024 17:20, Junio C Hamano wrote:
>>> Phillip Wood <phillip.wood123@gmail.com> writes:
>> It might be worth explaining why this happens
>>
>> This is because rebase sets the GIT_CHERRY_PICK_HELP environment
>> variable to customize the advice given to users when there are
>> conflicts which causes the sequencer to remove CHERRY_PICK_HEAD.
> 
> True.  I'd prefer to see the original submitter assemble the pieces
> and come up with the final version, rather than me doing so.

Thanks for explaining and sorry for the delay. I saw the patch was
merged to main now, but I will keep this in mind for next time.


Vegard

^ permalink raw reply

* [PATCH v2] mergetools: vimdiff: use correct tool's name when reading mergetool config
From: Kipras Melnikovas @ 2024-02-15 14:20 UTC (permalink / raw)
  To: git; +Cc: greenfoo, Kipras Melnikovas
In-Reply-To: <20240215083917.98218-2-kipras@kipras.org>

The /mergetools/vimdiff script, which handles both vimdiff, nvimdiff
and gvimdiff mergetools (the latter 2 simply source the vimdiff script), has a
function merge_cmd() which read the layout variable from git config, and it
would always read the value of mergetool.**vimdiff**.layout, instead of the
mergetool being currently used (vimdiff or nvimdiff or gvimdiff).

It looks like in 7b5cf8be18 (vimdiff: add tool documentation, 2022-03-30),
we explained the current behavior in Documentation/config/mergetool.txt:

---
mergetool.vimdiff.layout::
	The vimdiff backend uses this variable to control how its split
	windows look like. Applies even if you are using Neovim (`nvim`) or
	gVim (`gvim`) as the merge tool. See BACKEND SPECIFIC HINTS section
---

which makes sense why it's explained this way - the vimdiff backend is used by
gvim and nvim. But the mergetool's configuration should be separate for each tool,
and indeed that's confirmed in same commit at Documentation/mergetools/vimdiff.txt:

---
Variants

Instead of `--tool=vimdiff`, you can also use one of these other variants:
  * `--tool=gvimdiff`, to open gVim instead of Vim.
  * `--tool=nvimdiff`, to open Neovim instead of Vim.

When using these variants, in order to specify a custom layout you will have to
set configuration variables `mergetool.gvimdiff.layout` and
`mergetool.nvimdiff.layout` instead of `mergetool.vimdiff.layout`
---

So it looks like we just forgot to update the 1 part of the vimdiff script
that read the config variable. Cheers.

Though, for backwards-compatibility, I've kept the mergetool.vimdiff
fallback, so that people who unknowingly relied on it, won't have their
setup broken now.

Signed-off-by: Kipras Melnikovas <kipras@kipras.org>
---
Range-diff against v1:
1:  197e42deef ! 1:  070280d95d mergetools: vimdiff: use correct tool's name when reading mergetool config
    @@ Metadata
      ## Commit message ##
         So it looks like we just forgot to update the 1 part of the vimdiff script
         that read the config variable. Cheers.
     
    +    Though, for backwards-compatibility, I've kept the mergetool.vimdiff
    +    fallback, so that people who unknowingly relied on it, won't have their
    +    setup broken now.
    +
         Signed-off-by: Kipras Melnikovas <kipras@kipras.org>
     
    @@ mergetools/vimdiff: diff_cmd_help () {
     -	case "$1" in
     +	layout=$(git config mergetool.$TOOL.layout)
     +
    ++	# backwards-compatibility:
    ++	if test -z "$layout"
    ++	then
    ++		layout=$(git config mergetool.vimdiff.layout)
    ++	fi
    ++
     +	case "$TOOL" in
      	*vimdiff)
      		if test -z "$layout"

 Documentation/config/mergetool.txt |  9 +++++----
 mergetools/vimdiff                 | 12 ++++++++++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/mergetool.txt b/Documentation/config/mergetool.txt
index 294f61efd1..8e3d321a57 100644
--- a/Documentation/config/mergetool.txt
+++ b/Documentation/config/mergetool.txt
@@ -45,10 +45,11 @@ mergetool.meld.useAutoMerge::
 	value of `false` avoids using `--auto-merge` altogether, and is the
 	default value.
 
-mergetool.vimdiff.layout::
-	The vimdiff backend uses this variable to control how its split
-	windows appear. Applies even if you are using Neovim (`nvim`) or
-	gVim (`gvim`) as the merge tool. See BACKEND SPECIFIC HINTS section
+mergetool.{g,n,}vimdiff.layout::
+	The vimdiff backend uses this variable to control how its split windows
+	appear. Use `mergetool.vimdiff` for regular Vim, `mergetool.nvimdiff` for
+	Neovim and `mergetool.gvimdiff` for gVim to configure the merge tool. See
+	BACKEND SPECIFIC HINTS section
 ifndef::git-mergetool[]
 	in linkgit:git-mergetool[1].
 endif::[]
diff --git a/mergetools/vimdiff b/mergetools/vimdiff
index 06937acbf5..0e3058868a 100644
--- a/mergetools/vimdiff
+++ b/mergetools/vimdiff
@@ -371,9 +371,17 @@ diff_cmd_help () {
 
 
 merge_cmd () {
-	layout=$(git config mergetool.vimdiff.layout)
+	TOOL=$1
 
-	case "$1" in
+	layout=$(git config mergetool.$TOOL.layout)
+
+	# backwards-compatibility:
+	if test -z "$layout"
+	then
+		layout=$(git config mergetool.vimdiff.layout)
+	fi
+
+	case "$TOOL" in
 	*vimdiff)
 		if test -z "$layout"
 		then

base-commit: 4fc51f00ef18d2c0174ab2fd39d0ee473fd144bd
-- 
2.43.1


^ permalink raw reply related

* Re: [PATCH 7/7] fsmonitor: addressed comments for patch 1352
From: Patrick Steinhardt @ 2024-02-15 13:49 UTC (permalink / raw)
  To: marzi.esipreh via GitGitGadget
  Cc: git, Eric Sunshine [ ],
	Ævar Arnfjörð Bjarmason [ ], Glen Choo [ ],
	Johannes Schindelin [ ], Taylor Blau [ ], marzi, marzi.esipreh
In-Reply-To: <74a4fa335a7c2014a35be8556887170169360b36.1707992978.git.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 12768 bytes --]

On Thu, Feb 15, 2024 at 10:29:38AM +0000, marzi.esipreh via GitGitGadget wrote:
> From: "marzi.esipreh" <marzi.esipreh@uber.com>
> 
> addressed comments on 1352, rebased, resolved conflicts

Please squash these changes into the preceding commits whereever
required.

Patrick

> 
> Signed-off-by: Marzieh Esipreh <m.ispare63@gmail.com>
> ---
>  compat/fsmonitor/fsm-health-linux.c     |   2 +-
>  compat/fsmonitor/fsm-ipc-unix.c         |   6 +-
>  compat/fsmonitor/fsm-listen-linux.c     | 170 ++++++++++++------------
>  compat/fsmonitor/fsm-path-utils-linux.c |   1 +
>  compat/fsmonitor/fsm-settings-unix.c    |   3 +
>  5 files changed, 95 insertions(+), 87 deletions(-)
> 
> diff --git a/compat/fsmonitor/fsm-health-linux.c b/compat/fsmonitor/fsm-health-linux.c
> index b9f709e8548..4c291f8a066 100644
> --- a/compat/fsmonitor/fsm-health-linux.c
> +++ b/compat/fsmonitor/fsm-health-linux.c
> @@ -1,4 +1,4 @@
> -#include "cache.h"
> +#include "git-compat-util.h"
>  #include "config.h"
>  #include "fsmonitor.h"
>  #include "fsm-health.h"
> diff --git a/compat/fsmonitor/fsm-ipc-unix.c b/compat/fsmonitor/fsm-ipc-unix.c
> index eb25123fa12..70afddfd298 100644
> --- a/compat/fsmonitor/fsm-ipc-unix.c
> +++ b/compat/fsmonitor/fsm-ipc-unix.c
> @@ -1,10 +1,12 @@
> -#include "cache.h"
> +#include "git-compat-util.h"
>  #include "config.h"
>  #include "hex.h"
>  #include "strbuf.h"
>  #include "fsmonitor.h"
>  #include "fsmonitor-ipc.h"
>  #include "fsmonitor-path-utils.h"
> +#include "gettext.h"
> +#include "path.h"
>  
>  static GIT_PATH_FUNC(fsmonitor_ipc__get_default_path, "fsmonitor--daemon.ipc")
>  
> @@ -17,7 +19,7 @@ const char *fsmonitor_ipc__get_path(struct repository *r)
>  	unsigned char hash[GIT_MAX_RAWSZ];
>  
>  	if (!r)
> -		BUG("No repository passed into fsmonitor_ipc__get_path");
> +		BUG("no repository passed into fsmonitor_ipc__get_path");
>  
>  	if (ipc_path)
>  		return ipc_path;
> diff --git a/compat/fsmonitor/fsm-listen-linux.c b/compat/fsmonitor/fsm-listen-linux.c
> index e8548e4e009..84d8fb28d5d 100644
> --- a/compat/fsmonitor/fsm-listen-linux.c
> +++ b/compat/fsmonitor/fsm-listen-linux.c
> @@ -1,7 +1,10 @@
> -#include "cache.h"
> +#include "git-compat-util.h"
> +#include "config.h"
>  #include "fsmonitor.h"
>  #include "fsm-listen.h"
>  #include "fsmonitor--daemon.h"
> +#include "gettext.h"
> +#include "simple-ipc.h"
>  #include <dirent.h>
>  #include <fcntl.h>
>  #include <sys/inotify.h>
> @@ -129,15 +132,15 @@ static void remove_watch(struct watch_entry *w,
>  	hashmap_entry_init(&k1.ent, memhash(&w->wd, sizeof(int)));
>  	w1 = hashmap_remove_entry(&data->watches, &k1, ent, NULL);
>  	if (!w1)
> -		BUG("Double remove of watch for '%s'", w->dir);
> +		BUG("double remove of watch for '%s'", w->dir);
>  
>  	if (w1->cookie)
> -		BUG("Removing watch for '%s' which has a pending rename", w1->dir);
> +		BUG("removing watch for '%s' which has a pending rename", w1->dir);
>  
>  	hashmap_entry_init(&k2.ent, memhash(w->dir, strlen(w->dir)));
>  	w2 = hashmap_remove_entry(&data->revwatches, &k2, ent, NULL);
>  	if (!w2)
> -		BUG("Double remove of reverse watch for '%s'", w->dir);
> +		BUG("double remove of reverse watch for '%s'", w->dir);
>  
>  	/* w1->dir and w2->dir are interned strings, we don't own them */
>  	free(w1);
> @@ -187,7 +190,7 @@ static void add_dir_rename(uint32_t cookie, const char *path,
>  	hashmap_entry_init(&k.ent, memhash(path, strlen(path)));
>  	w = hashmap_get_entry(&data->revwatches, &k, ent, NULL);
>  	if (!w) /* should never happen */
> -		BUG("No watch for '%s'", path);
> +		BUG("no watch for '%s'", path);
>  	w->cookie = cookie;
>  
>  	/* add the pending rename to match against later */
> @@ -224,10 +227,10 @@ static void rename_dir(uint32_t cookie, const char *path,
>  			remove_watch(w, data);
>  			add_watch(path, data);
>  		} else {
> -			BUG("No matching watch");
> +			BUG("no matching watch");
>  		}
>  	} else {
> -		BUG("No matching cookie");
> +		BUG("no matching cookie");
>  	}
>  }
>  
> @@ -249,7 +252,7 @@ static int register_inotify(const char *path,
>  	if (!dir)
>  		return error_errno("opendir('%s') failed", path);
>  
> -	while ((de = readdir_skip_dot_and_dotdot(dir)) != NULL) {
> +	while ((de = readdir_skip_dot_and_dotdot(dir))) {
>  		strbuf_reset(&current);
>  		strbuf_addf(&current, "%s/%s", path, de->d_name);
>  		if (lstat(current.buf, &fs)) {
> @@ -353,7 +356,7 @@ static void log_mask_set(const char *path, u_int32_t mask)
>  	if (mask & IN_IGNORED)
>  		strbuf_addstr(&msg, "IN_IGNORED|");
>  	if (mask & IN_ISDIR)
> -		strbuf_addstr(&msg, "IN_ISDIR|");
> +		strbuf_addstr(&msg, "IN_ISDIR");
>  
>  	trace_printf_key(&trace_fsmonitor, "inotify_event: '%s', mask=%#8.8x %s",
>  				path, mask, msg.buf);
> @@ -373,8 +376,10 @@ int fsm_listen__ctor(struct fsmonitor_daemon_state *state)
>  	data->shutdown = SHUTDOWN_ERROR;
>  
>  	fd = inotify_init1(O_NONBLOCK);
> -	if (fd < 0)
> +	if (fd < 0) {
> +		FREE_AND_NULL(data);
>  		return error_errno("inotify_init1() failed");
> +	}
>  
>  	data->fd_inotify = fd;
>  
> @@ -386,12 +391,10 @@ int fsm_listen__ctor(struct fsmonitor_daemon_state *state)
>  		ret = -1;
>  	else if (register_inotify(state->path_worktree_watch.buf, state, NULL))
>  		ret = -1;
> -	else if (state->nr_paths_watching > 1) {
> -		if (add_watch(state->path_gitdir_watch.buf, data))
> -			ret = -1;
> -		else if (register_inotify(state->path_gitdir_watch.buf, state, NULL))
> -			ret = -1;
> -	}
> +	else if (state->nr_paths_watching > 1 &&
> +		 (add_watch(state->path_gitdir_watch.buf, data) ||
> +		  register_inotify(state->path_gitdir_watch.buf, state, NULL)))
> +		ret = -1;
>  
>  	if (!ret) {
>  		state->listen_error_code = 0;
> @@ -449,80 +452,80 @@ static int process_event(const char *path,
>  	const char *last_sep;
>  
>  	switch (fsmonitor_classify_path_absolute(state, path)) {
> -		case IS_INSIDE_DOT_GIT_WITH_COOKIE_PREFIX:
> -		case IS_INSIDE_GITDIR_WITH_COOKIE_PREFIX:
> -			/* Use just the filename of the cookie file. */
> -			last_sep = find_last_dir_sep(path);
> -			string_list_append(cookie_list,
> -					last_sep ? last_sep + 1 : path);
> -			break;
> -		case IS_INSIDE_DOT_GIT:
> -		case IS_INSIDE_GITDIR:
> -			break;
> -		case IS_DOT_GIT:
> -		case IS_GITDIR:
> -			/*
> -			* If .git directory is deleted or renamed away,
> -			* we have to quit.
> -			*/
> -			if (em_dir_deleted(event->mask)) {
> -				trace_printf_key(&trace_fsmonitor,
> -						"event: gitdir removed");
> -				state->listen_data->shutdown = SHUTDOWN_FORCE;
> -				goto done;
> -			}
> +	case IS_INSIDE_DOT_GIT_WITH_COOKIE_PREFIX:
> +	case IS_INSIDE_GITDIR_WITH_COOKIE_PREFIX:
> +		/* Use just the filename of the cookie file. */
> +		last_sep = find_last_dir_sep(path);
> +		string_list_append(cookie_list,
> +				last_sep ? last_sep + 1 : path);
> +		break;
> +	case IS_INSIDE_DOT_GIT:
> +	case IS_INSIDE_GITDIR:
> +		break;
> +	case IS_DOT_GIT:
> +	case IS_GITDIR:
> +		/*
> +		* If .git directory is deleted or renamed away,
> +		* we have to quit.
> +		*/
> +		if (em_dir_deleted(event->mask)) {
> +			trace_printf_key(&trace_fsmonitor,
> +					"event: gitdir removed");
> +			state->listen_data->shutdown = SHUTDOWN_FORCE;
> +			goto done;
> +		}
>  
> -			if (em_dir_renamed(event->mask)) {
> -				trace_printf_key(&trace_fsmonitor,
> -						"event: gitdir renamed");
> -				state->listen_data->shutdown = SHUTDOWN_FORCE;
> -				goto done;
> -			}
> -			break;
> -		case IS_WORKDIR_PATH:
> -			/* normal events in the working directory */
> -			if (trace_pass_fl(&trace_fsmonitor))
> -				log_mask_set(path, event->mask);
> +		if (em_dir_renamed(event->mask)) {
> +			trace_printf_key(&trace_fsmonitor,
> +					"event: gitdir renamed");
> +			state->listen_data->shutdown = SHUTDOWN_FORCE;
> +			goto done;
> +		}
> +		break;
> +	case IS_WORKDIR_PATH:
> +		/* normal events in the working directory */
> +		if (trace_pass_fl(&trace_fsmonitor))
> +			log_mask_set(path, event->mask);
>  
> -			rel = path + state->path_worktree_watch.len + 1;
> -			fsmonitor_batch__add_path(batch, rel);
> +		rel = path + state->path_worktree_watch.len + 1;
> +		fsmonitor_batch__add_path(batch, rel);
>  
> -			if (em_dir_deleted(event->mask))
> -				break;
> +		if (em_dir_deleted(event->mask))
> +			break;
>  
> -			/* received IN_MOVE_FROM, add tracking for expected IN_MOVE_TO */
> -			if (em_rename_dir_from(event->mask))
> -				add_dir_rename(event->cookie, path, state->listen_data);
> +		/* received IN_MOVE_FROM, add tracking for expected IN_MOVE_TO */
> +		if (em_rename_dir_from(event->mask))
> +			add_dir_rename(event->cookie, path, state->listen_data);
>  
> -			/* received IN_MOVE_TO, update watch to reflect new path */
> -			if (em_rename_dir_to(event->mask)) {
> -				rename_dir(event->cookie, path, state->listen_data);
> -				if (register_inotify(path, state, batch)) {
> -					state->listen_data->shutdown = SHUTDOWN_ERROR;
> -					goto done;
> -				}
> +		/* received IN_MOVE_TO, update watch to reflect new path */
> +		if (em_rename_dir_to(event->mask)) {
> +			rename_dir(event->cookie, path, state->listen_data);
> +			if (register_inotify(path, state, batch)) {
> +				state->listen_data->shutdown = SHUTDOWN_ERROR;
> +				goto done;
>  			}
> +		}
>  
> -			if (em_dir_created(event->mask)) {
> -				if (add_watch(path, state->listen_data)) {
> -					state->listen_data->shutdown = SHUTDOWN_ERROR;
> -					goto done;
> -				}
> -				if (register_inotify(path, state, batch)) {
> -					state->listen_data->shutdown = SHUTDOWN_ERROR;
> -					goto done;
> -				}
> +		if (em_dir_created(event->mask)) {
> +			if (add_watch(path, state->listen_data)) {
> +				state->listen_data->shutdown = SHUTDOWN_ERROR;
> +				goto done;
>  			}
> -			break;
> -		case IS_OUTSIDE_CONE:
> -		default:
> -			trace_printf_key(&trace_fsmonitor,
> -					"ignoring '%s'", path);
> -			break;
> +			if (register_inotify(path, state, batch)) {
> +				state->listen_data->shutdown = SHUTDOWN_ERROR;
> +				goto done;
> +			}
> +		}
> +		break;
> +	case IS_OUTSIDE_CONE:
> +	default:
> +		trace_printf_key(&trace_fsmonitor,
> +				"ignoring '%s'", path);
> +		break;
>  	}
>  	return 0;
> -done:
> -	return -1;
> +	done:
> +		return -1;
>  }
>  
>  /*
> @@ -531,7 +534,7 @@ static int process_event(const char *path,
>   */
>  static void handle_events(struct fsmonitor_daemon_state *state)
>  {
> -	 /* See https://man7.org/linux/man-pages/man7/inotify.7.html */
> +	/* See https://man7.org/linux/man-pages/man7/inotify.7.html */
>  	char buf[4096]
>  		__attribute__ ((aligned(__alignof__(struct inotify_event))));
>  
> @@ -539,13 +542,12 @@ static void handle_events(struct fsmonitor_daemon_state *state)
>  	struct fsmonitor_batch *batch = NULL;
>  	struct string_list cookie_list = STRING_LIST_INIT_DUP;
>  	struct watch_entry k, *w;
> -	struct strbuf path;
>  	const struct inotify_event *event;
>  	int fd = state->listen_data->fd_inotify;
>  	ssize_t len;
>  	char *ptr, *p;
>  
> -	strbuf_init(&path, PATH_MAX);
> +	struct strbuf path = STRBUF_INIT;
>  
>  	for(;;) {
>  		len = read(fd, buf, sizeof(buf));
> @@ -581,7 +583,7 @@ static void handle_events(struct fsmonitor_daemon_state *state)
>  
>  			w = hashmap_get_entry(&watches, &k, ent, NULL);
>  			if (!w) /* should never happen */
> -				BUG("No watch for '%s'", event->name);
> +				BUG("no watch for '%s'", event->name);
>  
>  			/* directory watch was removed */
>  			if (em_remove_watch(event->mask)) {
> diff --git a/compat/fsmonitor/fsm-path-utils-linux.c b/compat/fsmonitor/fsm-path-utils-linux.c
> index c21d1349532..0e3b33ffa48 100644
> --- a/compat/fsmonitor/fsm-path-utils-linux.c
> +++ b/compat/fsmonitor/fsm-path-utils-linux.c
> @@ -3,6 +3,7 @@
>  #include "fsmonitor.h"
>  #include "fsmonitor-path-utils.h"
>  #include "fsm-path-utils-linux.h"
> +#include "gettext.h"
>  #include <errno.h>
>  #include <mntent.h>
>  #include <sys/mount.h>
> diff --git a/compat/fsmonitor/fsm-settings-unix.c b/compat/fsmonitor/fsm-settings-unix.c
> index d16dca89416..c9b75aa44fe 100644
> --- a/compat/fsmonitor/fsm-settings-unix.c
> +++ b/compat/fsmonitor/fsm-settings-unix.c
> @@ -1,6 +1,9 @@
> +#include "git-compat-util.h"
> +#include "config.h"
>  #include "fsmonitor.h"
>  #include "fsmonitor-ipc.h"
>  #include "fsmonitor-path-utils.h"
> +#include <stdint.h>
>  
>   /*
>   * For the builtin FSMonitor, we create the Unix domain socket for the
> -- 
> gitgitgadget
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 3/7] fsmonitor: implement filesystem change listener for Linux
From: Patrick Steinhardt @ 2024-02-15 13:49 UTC (permalink / raw)
  To: Eric DeCosta via GitGitGadget
  Cc: git, Eric Sunshine [ ],
	Ævar Arnfjörð Bjarmason [ ], Glen Choo [ ],
	Johannes Schindelin [ ], Taylor Blau [ ], marzi, Eric DeCosta
In-Reply-To: <5fad429b4d53dee4eb509f0db98ef860762436fc.1707992978.git.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 22837 bytes --]

On Thu, Feb 15, 2024 at 10:29:34AM +0000, Eric DeCosta via GitGitGadget wrote:
> From: Eric DeCosta <edecosta@mathworks.com>
> 
> Implement a filesystem change listener for Linux based on the inotify API:
> https://man7.org/linux/man-pages/man7/inotify.7.html
> 
> inotify requires registering a watch on every directory in the worktree and
> special handling of moves/renames.

I assume that fsmonitor is especially important in the context of repos
with large trees, and to the best of my knowledge inotify(7) does not
scale well when installing many watches. I thus have to wonder whether
fanotify(7) would be a better match to implement this nowadays, and what
the considerations were to pick one over the other.

> Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
> ---
>  compat/fsmonitor/fsm-listen-linux.c | 676 ++++++++++++++++++++++++++++
>  1 file changed, 676 insertions(+)
>  create mode 100644 compat/fsmonitor/fsm-listen-linux.c
> 
> diff --git a/compat/fsmonitor/fsm-listen-linux.c b/compat/fsmonitor/fsm-listen-linux.c
> new file mode 100644
> index 00000000000..e8548e4e009
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-listen-linux.c
> @@ -0,0 +1,676 @@
> +#include "cache.h"
> +#include "fsmonitor.h"
> +#include "fsm-listen.h"
> +#include "fsmonitor--daemon.h"
> +#include <dirent.h>
> +#include <fcntl.h>
> +#include <sys/inotify.h>
> +#include <sys/stat.h>
> +
> +/*
> + * Safe value to bitwise OR with rest of mask for
> + * kernels that do not support IN_MASK_CREATE
> + */
> +#ifndef IN_MASK_CREATE
> +#define IN_MASK_CREATE 0x00000000
> +#endif
> +
> +enum shutdown_reason {
> +	SHUTDOWN_CONTINUE = 0,
> +	SHUTDOWN_STOP,
> +	SHUTDOWN_ERROR,
> +	SHUTDOWN_FORCE
> +};
> +
> +struct watch_entry {
> +	struct hashmap_entry ent;
> +	int wd;
> +	uint32_t cookie;
> +	const char *dir;
> +};
> +
> +struct rename_entry {
> +	struct hashmap_entry ent;
> +	time_t whence;
> +	uint32_t cookie;
> +	const char *dir;
> +};
> +
> +struct fsm_listen_data {
> +	int fd_inotify;
> +	enum shutdown_reason shutdown;
> +	struct hashmap watches;
> +	struct hashmap renames;
> +	struct hashmap revwatches;
> +};
> +
> +static int watch_entry_cmp(const void *cmp_data,
> +			  const struct hashmap_entry *eptr,
> +			  const struct hashmap_entry *entry_or_key,
> +			  const void *keydata)
> +{
> +	const struct watch_entry *e1, *e2;
> +
> +	e1 = container_of(eptr, const struct watch_entry, ent);
> +	e2 = container_of(eptr, const struct watch_entry, ent);
> +	return e1->wd != e2->wd;
> +}
> +
> +static int revwatches_entry_cmp(const void *cmp_data,
> +			  const struct hashmap_entry *eptr,
> +			  const struct hashmap_entry *entry_or_key,
> +			  const void *keydata)
> +{
> +	const struct watch_entry *e1, *e2;
> +
> +	e1 = container_of(eptr, const struct watch_entry, ent);
> +	e2 = container_of(eptr, const struct watch_entry, ent);
> +	return strcmp(e1->dir, e2->dir);
> +}
> +
> +static int rename_entry_cmp(const void *cmp_data,
> +			  const struct hashmap_entry *eptr,
> +			  const struct hashmap_entry *entry_or_key,
> +			  const void *keydata)
> +{
> +	const struct rename_entry *e1, *e2;
> +
> +	e1 = container_of(eptr, const struct rename_entry, ent);
> +	e2 = container_of(eptr, const struct rename_entry, ent);
> +	return e1->cookie != e2->cookie;
> +}
> +
> +/*
> + * Register an inotify watch, add watch descriptor to path mapping
> + * and the reverse mapping.
> + */
> +static int add_watch(const char *path, struct fsm_listen_data *data)
> +{
> +	const char *interned = strintern(path);
> +	struct watch_entry *w1, *w2;
> +
> +	/* add the inotify watch, don't allow watches to be modified */
> +	int wd = inotify_add_watch(data->fd_inotify, interned,
> +				(IN_ALL_EVENTS | IN_ONLYDIR | IN_MASK_CREATE)
> +				^ IN_ACCESS ^ IN_CLOSE ^ IN_OPEN);
> +	if (wd < 0)
> +		return error_errno("inotify_add_watch('%s') failed", interned);
> +
> +	/* add watch descriptor -> directory mapping */
> +	CALLOC_ARRAY(w1, 1);
> +	w1->wd = wd;
> +	w1->dir = interned;
> +	hashmap_entry_init(&w1->ent, memhash(&w1->wd, sizeof(int)));
> +	hashmap_add(&data->watches, &w1->ent);
> +
> +	/* add directory -> watch descriptor mapping */
> +	CALLOC_ARRAY(w2, 1);
> +	w2->wd = wd;
> +	w2->dir = interned;
> +	hashmap_entry_init(&w2->ent, memhash(w2->dir, strlen(w2->dir)));
> +	hashmap_add(&data->revwatches, &w2->ent);
> +
> +	return 0;
> +}
> +
> +/*
> + * Remove the inotify watch, the watch descriptor to path mapping
> + * and the reverse mapping.
> + */
> +static void remove_watch(struct watch_entry *w,
> +	struct fsm_listen_data *data)
> +{
> +	struct watch_entry k1, k2, *w1, *w2;
> +
> +	/* remove watch, ignore error if kernel already did it */
> +	if (inotify_rm_watch(data->fd_inotify, w->wd) && errno != EINVAL)
> +		error_errno("inotify_rm_watch() failed");
> +
> +	hashmap_entry_init(&k1.ent, memhash(&w->wd, sizeof(int)));
> +	w1 = hashmap_remove_entry(&data->watches, &k1, ent, NULL);
> +	if (!w1)
> +		BUG("Double remove of watch for '%s'", w->dir);
> +
> +	if (w1->cookie)
> +		BUG("Removing watch for '%s' which has a pending rename", w1->dir);
> +
> +	hashmap_entry_init(&k2.ent, memhash(w->dir, strlen(w->dir)));
> +	w2 = hashmap_remove_entry(&data->revwatches, &k2, ent, NULL);
> +	if (!w2)
> +		BUG("Double remove of reverse watch for '%s'", w->dir);
> +
> +	/* w1->dir and w2->dir are interned strings, we don't own them */
> +	free(w1);
> +	free(w2);
> +}
> +
> +/*
> + * Check for stale directory renames.
> + *
> + * https://man7.org/linux/man-pages/man7/inotify.7.html
> + *
> + * Allow for some small timeout to account for the fact that insertion of the
> + * IN_MOVED_FROM+IN_MOVED_TO event pair is not atomic, and the possibility that
> + * there may not be any IN_MOVED_TO event.
> + *
> + * If the IN_MOVED_TO event is not received within the timeout then events have
> + * been missed and the monitor is in an inconsistent state with respect to the
> + * filesystem.
> + */
> +static int check_stale_dir_renames(struct hashmap *renames, time_t max_age)
> +{
> +	struct rename_entry *re;
> +	struct hashmap_iter iter;
> +
> +	hashmap_for_each_entry(renames, &iter, re, ent) {
> +		if (re->whence <= max_age)
> +			return -1;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Track pending renames.
> + *
> + * Tracking is done via a event cookie to watch descriptor mapping.
> + *
> + * A rename is not complete until matching a IN_MOVED_TO event is received
> + * for a corresponding IN_MOVED_FROM event.
> + */
> +static void add_dir_rename(uint32_t cookie, const char *path,
> +	struct fsm_listen_data *data)

Nit: `add_dir_rename()` and the below `rename_dir()` sound as if we
actually perform the rename ourselves. How about `track_dir_rename()`
and `finalize_dir_rename()`?

> +{
> +	struct watch_entry k, *w;
> +	struct rename_entry *re;
> +
> +	/* lookup the watch descriptor for the given path */
> +	hashmap_entry_init(&k.ent, memhash(path, strlen(path)));
> +	w = hashmap_get_entry(&data->revwatches, &k, ent, NULL);
> +	if (!w) /* should never happen */
> +		BUG("No watch for '%s'", path);

Error message should start with lower-case character.

> +	w->cookie = cookie;
> +
> +	/* add the pending rename to match against later */
> +	CALLOC_ARRAY(re, 1);
> +	re->dir = w->dir;
> +	re->cookie = w->cookie;
> +	re->whence = time(NULL);
> +	hashmap_entry_init(&re->ent, memhash(&re->cookie, sizeof(uint32_t)));
> +	hashmap_add(&data->renames, &re->ent);
> +}
> +
> +/*
> + * Handle directory renames
> + *
> + * Once a IN_MOVED_TO event is received, lookup the rename tracking information
> + * via the event cookie and use this information to update the watch.
> + */
> +static void rename_dir(uint32_t cookie, const char *path,
> +	struct fsm_listen_data *data)
> +{
> +	struct rename_entry rek, *re;
> +	struct watch_entry k, *w;
> +
> +	/* lookup a pending rename to match */
> +	rek.cookie = cookie;
> +	hashmap_entry_init(&rek.ent, memhash(&rek.cookie, sizeof(uint32_t)));
> +	re = hashmap_get_entry(&data->renames, &rek, ent, NULL);
> +	if (re) {
> +		k.dir = re->dir;
> +		hashmap_entry_init(&k.ent, memhash(k.dir, strlen(k.dir)));
> +		w = hashmap_get_entry(&data->revwatches, &k, ent, NULL);
> +		if (w) {
> +			w->cookie = 0; /* rename handled */
> +			remove_watch(w, data);
> +			add_watch(path, data);
> +		} else {
> +			BUG("No matching watch");
> +		}
> +	} else {
> +		BUG("No matching cookie");

The above two bugs should start with a lower-case letter.

> +	}
> +}
> +
> +/*
> + * Recursively add watches to every directory under path
> + */
> +static int register_inotify(const char *path,
> +	struct fsmonitor_daemon_state *state,
> +	struct fsmonitor_batch *batch)
> +{
> +	DIR *dir;
> +	const char *rel;
> +	struct strbuf current = STRBUF_INIT;
> +	struct dirent *de;
> +	struct stat fs;
> +	int ret = -1;
> +
> +	dir = opendir(path);
> +	if (!dir)
> +		return error_errno("opendir('%s') failed", path);
> +
> +	while ((de = readdir_skip_dot_and_dotdot(dir)) != NULL) {
> +		strbuf_reset(&current);
> +		strbuf_addf(&current, "%s/%s", path, de->d_name);
> +		if (lstat(current.buf, &fs)) {
> +			error_errno("lstat('%s') failed", current.buf);

Missing `_()` translation marker.

> +			goto failed;
> +		}
> +
> +		/* recurse into directory */
> +		if (S_ISDIR(fs.st_mode)) {
> +			if (add_watch(current.buf, state->listen_data))
> +				goto failed;
> +			if (register_inotify(current.buf, state, batch))
> +				goto failed;
> +		} else if (batch) {
> +			rel = current.buf + state->path_worktree_watch.len + 1;
> +			trace_printf_key(&trace_fsmonitor, "explicitly adding '%s'", rel);
> +			fsmonitor_batch__add_path(batch, rel);
> +		}
> +	}
> +	ret = 0;
> +
> +failed:
> +	strbuf_release(&current);
> +	if (closedir(dir) < 0)
> +		return error_errno("closedir('%s') failed", path);

Missing `_()` translation marker.

> +	return ret;
> +}
> +
> +static int em_rename_dir_from(u_int32_t mask)
> +{
> +	return ((mask & IN_ISDIR) && (mask & IN_MOVED_FROM));
> +}
> +
> +static int em_rename_dir_to(u_int32_t mask)
> +{
> +	return ((mask & IN_ISDIR) && (mask & IN_MOVED_TO));
> +}
> +
> +static int em_remove_watch(u_int32_t mask)
> +{
> +	return (mask & IN_DELETE_SELF);
> +}
> +
> +static int em_dir_renamed(u_int32_t mask)
> +{
> +	return ((mask & IN_ISDIR) && (mask & IN_MOVE));
> +}
> +
> +static int em_dir_created(u_int32_t mask)
> +{
> +	return ((mask & IN_ISDIR) && (mask & IN_CREATE));
> +}
> +
> +static int em_dir_deleted(uint32_t mask)
> +{
> +	return ((mask & IN_ISDIR) && (mask & IN_DELETE));
> +}
> +
> +static int em_force_shutdown(u_int32_t mask)
> +{
> +	return (mask & IN_UNMOUNT) || (mask & IN_Q_OVERFLOW);
> +}
> +
> +static int em_ignore(u_int32_t mask)
> +{
> +	return (mask & IN_IGNORED) || (mask & IN_MOVE_SELF);
> +}
> +
> +static void log_mask_set(const char *path, u_int32_t mask)
> +{
> +	struct strbuf msg = STRBUF_INIT;
> +
> +	if (mask & IN_ACCESS)
> +		strbuf_addstr(&msg, "IN_ACCESS|");
> +	if (mask & IN_MODIFY)
> +		strbuf_addstr(&msg, "IN_MODIFY|");
> +	if (mask & IN_ATTRIB)
> +		strbuf_addstr(&msg, "IN_ATTRIB|");
> +	if (mask & IN_CLOSE_WRITE)
> +		strbuf_addstr(&msg, "IN_CLOSE_WRITE|");
> +	if (mask & IN_CLOSE_NOWRITE)
> +		strbuf_addstr(&msg, "IN_CLOSE_NOWRITE|");
> +	if (mask & IN_OPEN)
> +		strbuf_addstr(&msg, "IN_OPEN|");
> +	if (mask & IN_MOVED_FROM)
> +		strbuf_addstr(&msg, "IN_MOVED_FROM|");
> +	if (mask & IN_MOVED_TO)
> +		strbuf_addstr(&msg, "IN_MOVED_TO|");
> +	if (mask & IN_CREATE)
> +		strbuf_addstr(&msg, "IN_CREATE|");
> +	if (mask & IN_DELETE)
> +		strbuf_addstr(&msg, "IN_DELETE|");
> +	if (mask & IN_DELETE_SELF)
> +		strbuf_addstr(&msg, "IN_DELETE_SELF|");
> +	if (mask & IN_MOVE_SELF)
> +		strbuf_addstr(&msg, "IN_MOVE_SELF|");
> +	if (mask & IN_UNMOUNT)
> +		strbuf_addstr(&msg, "IN_UNMOUNT|");
> +	if (mask & IN_Q_OVERFLOW)
> +		strbuf_addstr(&msg, "IN_Q_OVERFLOW|");
> +	if (mask & IN_IGNORED)
> +		strbuf_addstr(&msg, "IN_IGNORED|");
> +	if (mask & IN_ISDIR)
> +		strbuf_addstr(&msg, "IN_ISDIR|");

Doesn't this end up with one trailing '|' in `msg`? You could use
`strbuf_strip_suffix(msg, "|")` to drop it.

> +	trace_printf_key(&trace_fsmonitor, "inotify_event: '%s', mask=%#8.8x %s",
> +				path, mask, msg.buf);
> +
> +	strbuf_release(&msg);
> +}
> +
> +int fsm_listen__ctor(struct fsmonitor_daemon_state *state)
> +{
> +	int fd;
> +	int ret = 0;
> +	struct fsm_listen_data *data;
> +
> +	CALLOC_ARRAY(data, 1);
> +	state->listen_data = data;
> +	state->listen_error_code = -1;
> +	data->shutdown = SHUTDOWN_ERROR;
> +
> +	fd = inotify_init1(O_NONBLOCK);
> +	if (fd < 0)
> +		return error_errno("inotify_init1() failed");
> +
> +	data->fd_inotify = fd;
> +
> +	hashmap_init(&data->watches, watch_entry_cmp, NULL, 0);
> +	hashmap_init(&data->renames, rename_entry_cmp, NULL, 0);
> +	hashmap_init(&data->revwatches, revwatches_entry_cmp, NULL, 0);
> +
> +	if (add_watch(state->path_worktree_watch.buf, data))
> +		ret = -1;
> +	else if (register_inotify(state->path_worktree_watch.buf, state, NULL))
> +		ret = -1;
> +	else if (state->nr_paths_watching > 1) {
> +		if (add_watch(state->path_gitdir_watch.buf, data))
> +			ret = -1;
> +		else if (register_inotify(state->path_gitdir_watch.buf, state, NULL))
> +			ret = -1;
> +	}

Style: if one of the branches requires braces then all of them should.

> +
> +	if (!ret) {
> +		state->listen_error_code = 0;
> +		data->shutdown = SHUTDOWN_CONTINUE;
> +	}
> +
> +	return ret;
> +}
> +
> +void fsm_listen__dtor(struct fsmonitor_daemon_state *state)
> +{
> +	struct fsm_listen_data *data;
> +	struct hashmap_iter iter;
> +	struct watch_entry *w;
> +	int fd;
> +
> +	if (!state || !state->listen_data)
> +		return;
> +
> +	data = state->listen_data;
> +	fd = data->fd_inotify;
> +
> +	hashmap_for_each_entry(&data->watches, &iter, w, ent) {
> +		w->cookie = 0; /* ignore any pending renames */
> +		remove_watch(w, data);
> +	}
> +	hashmap_clear(&data->watches);
> +
> +	hashmap_clear(&data->revwatches); /* remove_watch freed the entries */
> +
> +	hashmap_clear_and_free(&data->renames, struct rename_entry, ent);
> +
> +	FREE_AND_NULL(state->listen_data);

The empty lines between all these cleanups  can probably be removed.

> +
> +	if (fd && (close(fd) < 0))
> +		error_errno(_("closing inotify file descriptor failed"));
> +}
> +
> +void fsm_listen__stop_async(struct fsmonitor_daemon_state *state)
> +{
> +	if (!state->listen_data->shutdown)
> +		state->listen_data->shutdown = SHUTDOWN_STOP;
> +}
> +
> +/*
> + * Process a single inotify event and queue for publication.
> + */
> +static int process_event(const char *path,
> +	const struct inotify_event *event,
> +	struct fsmonitor_batch *batch,
> +	struct string_list *cookie_list,
> +	struct fsmonitor_daemon_state *state)
> +{
> +	const char *rel;
> +	const char *last_sep;
> +
> +	switch (fsmonitor_classify_path_absolute(state, path)) {
> +		case IS_INSIDE_DOT_GIT_WITH_COOKIE_PREFIX:
> +		case IS_INSIDE_GITDIR_WITH_COOKIE_PREFIX:
> +			/* Use just the filename of the cookie file. */
> +			last_sep = find_last_dir_sep(path);
> +			string_list_append(cookie_list,
> +					last_sep ? last_sep + 1 : path);
> +			break;
> +		case IS_INSIDE_DOT_GIT:
> +		case IS_INSIDE_GITDIR:
> +			break;
> +		case IS_DOT_GIT:
> +		case IS_GITDIR:
> +			/*
> +			* If .git directory is deleted or renamed away,
> +			* we have to quit.
> +			*/
> +			if (em_dir_deleted(event->mask)) {
> +				trace_printf_key(&trace_fsmonitor,
> +						"event: gitdir removed");
> +				state->listen_data->shutdown = SHUTDOWN_FORCE;
> +				goto done;
> +			}
> +
> +			if (em_dir_renamed(event->mask)) {
> +				trace_printf_key(&trace_fsmonitor,
> +						"event: gitdir renamed");
> +				state->listen_data->shutdown = SHUTDOWN_FORCE;
> +				goto done;
> +			}
> +			break;
> +		case IS_WORKDIR_PATH:
> +			/* normal events in the working directory */
> +			if (trace_pass_fl(&trace_fsmonitor))
> +				log_mask_set(path, event->mask);
> +
> +			rel = path + state->path_worktree_watch.len + 1;
> +			fsmonitor_batch__add_path(batch, rel);
> +
> +			if (em_dir_deleted(event->mask))
> +				break;
> +
> +			/* received IN_MOVE_FROM, add tracking for expected IN_MOVE_TO */
> +			if (em_rename_dir_from(event->mask))
> +				add_dir_rename(event->cookie, path, state->listen_data);
> +
> +			/* received IN_MOVE_TO, update watch to reflect new path */
> +			if (em_rename_dir_to(event->mask)) {
> +				rename_dir(event->cookie, path, state->listen_data);
> +				if (register_inotify(path, state, batch)) {
> +					state->listen_data->shutdown = SHUTDOWN_ERROR;
> +					goto done;
> +				}
> +			}
> +
> +			if (em_dir_created(event->mask)) {
> +				if (add_watch(path, state->listen_data)) {
> +					state->listen_data->shutdown = SHUTDOWN_ERROR;
> +					goto done;
> +				}
> +				if (register_inotify(path, state, batch)) {
> +					state->listen_data->shutdown = SHUTDOWN_ERROR;
> +					goto done;
> +				}
> +			}
> +			break;
> +		case IS_OUTSIDE_CONE:
> +		default:
> +			trace_printf_key(&trace_fsmonitor,
> +					"ignoring '%s'", path);
> +			break;
> +	}
> +	return 0;
> +done:
> +	return -1;
> +}
> +
> +/*
> + * Read the inotify event stream and pre-process events before further
> + * processing and eventual publishing.
> + */
> +static void handle_events(struct fsmonitor_daemon_state *state)
> +{
> +	 /* See https://man7.org/linux/man-pages/man7/inotify.7.html */
> +	char buf[4096]
> +		__attribute__ ((aligned(__alignof__(struct inotify_event))));
> +
> +	struct hashmap watches = state->listen_data->watches;
> +	struct fsmonitor_batch *batch = NULL;
> +	struct string_list cookie_list = STRING_LIST_INIT_DUP;
> +	struct watch_entry k, *w;
> +	struct strbuf path;
> +	const struct inotify_event *event;
> +	int fd = state->listen_data->fd_inotify;
> +	ssize_t len;
> +	char *ptr, *p;

I think many of the variables could be moved into deeper scopes. Makes
it easier to assess what is being used where.

> +
> +	strbuf_init(&path, PATH_MAX);
> +
> +	for(;;) {
> +		len = read(fd, buf, sizeof(buf));
> +		if (len == -1 && errno != EAGAIN) {

Wouldn't we also have to handle EINTR here?

> +			error_errno(_("reading inotify message stream failed"));
> +			state->listen_data->shutdown = SHUTDOWN_ERROR;
> +			goto done;
> +		}
> +
> +		/* nothing to read */
> +		if (len <= 0)
> +			goto done;
> +
> +		/* Loop over all events in the buffer. */
> +		for (ptr = buf; ptr < buf + len;
> +			 ptr += sizeof(struct inotify_event) + event->len) {

Nit: there's an additional whitespace here.

> +
> +			event = (const struct inotify_event *) ptr;
> +
> +			if (em_ignore(event->mask))
> +				continue;
> +
> +			/* File system was unmounted or event queue overflowed */
> +			if (em_force_shutdown(event->mask)) {
> +				if (trace_pass_fl(&trace_fsmonitor))
> +					log_mask_set("Forcing shutdown", event->mask);
> +				state->listen_data->shutdown = SHUTDOWN_FORCE;
> +				goto done;
> +			}
> +
> +			hashmap_entry_init(&k.ent, memhash(&event->wd, sizeof(int)));
> +			k.wd = event->wd;
> +
> +			w = hashmap_get_entry(&watches, &k, ent, NULL);
> +			if (!w) /* should never happen */
> +				BUG("No watch for '%s'", event->name);

Error messages should start with a lower-case letter.

> +
> +			/* directory watch was removed */
> +			if (em_remove_watch(event->mask)) {
> +				remove_watch(w, state->listen_data);
> +				continue;
> +			}
> +
> +			strbuf_reset(&path);
> +			strbuf_add(&path, w->dir, strlen(w->dir));
> +			strbuf_addch(&path, '/');
> +			strbuf_addstr(&path, event->name);

The above three lines can be simplified to:

```
strbuf_addf("%s/%s", w->dir, event->name);
```

> +
> +			p = fsmonitor__resolve_alias(path.buf, &state->alias);
> +			if (!p)
> +				p = strbuf_detach(&path, NULL);
> +
> +			if (!batch)
> +				batch = fsmonitor_batch__new();
> +
> +			if (process_event(p, event, batch, &cookie_list, state)) {
> +				free(p);
> +				goto done;
> +			}
> +			free(p);
> +		}
> +		strbuf_reset(&path);
> +		fsmonitor_publish(state, batch, &cookie_list);
> +		string_list_clear(&cookie_list, 0);
> +		batch = NULL;
> +	}
> +done:
> +	strbuf_release(&path);
> +	fsmonitor_batch__free_list(batch);
> +	string_list_clear(&cookie_list, 0);
> +}
> +
> +/*
> + * Non-blocking read of the inotify events stream. The inotify fd is polled
> + * frequently to help minimize the number of queue overflows.
> + */
> +void fsm_listen__loop(struct fsmonitor_daemon_state *state)
> +{
> +	int poll_num;
> +	const int interval = 1000;

You could rename the variable to `interval_ms` to clarify its unit.

> +	time_t checked = time(NULL);
> +	struct pollfd fds[1];
> +	fds[0].fd = state->listen_data->fd_inotify;
> +	fds[0].events = POLLIN;
> +
> +	for(;;) {

Nit: `while (1)`

> +		switch (state->listen_data->shutdown) {
> +			case SHUTDOWN_CONTINUE:
> +				poll_num = poll(fds, 1, 1);

Why do you pick a timeout of 1 millisecond here? I'd have expected us to
instruct poll(3p) to either not block at all (0) or to block for
`interval`.

> +				if (poll_num == -1) {
> +					if (errno == EINTR)
> +						continue;

Wouldn't we also have to handle EAGAIN here?

Patrick

> +					error_errno(_("polling inotify message stream failed"));
> +					state->listen_data->shutdown = SHUTDOWN_ERROR;
> +					continue;
> +				}
> +
> +				if ((time(NULL) - checked) >= interval) {
> +					checked = time(NULL);
> +					if (check_stale_dir_renames(&state->listen_data->renames,
> +						checked - interval)) {
> +						trace_printf_key(&trace_fsmonitor,
> +							"Missed IN_MOVED_TO events, forcing shutdown");
> +						state->listen_data->shutdown = SHUTDOWN_FORCE;
> +						continue;
> +					}
> +				}
> +
> +				if (poll_num > 0 && (fds[0].revents & POLLIN))
> +					handle_events(state);
> +
> +				continue;
> +			case SHUTDOWN_ERROR:
> +				state->listen_error_code = -1;
> +				ipc_server_stop_async(state->ipc_server_data);
> +				break;
> +			case SHUTDOWN_FORCE:
> +				state->listen_error_code = 0;
> +				ipc_server_stop_async(state->ipc_server_data);
> +				break;
> +			case SHUTDOWN_STOP:
> +			default:
> +				state->listen_error_code = 0;
> +				break;
> +		}
> +		return;
> +	}
> +}
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/7] fsmonitor: determine if filesystem is local or remote
From: Patrick Steinhardt @ 2024-02-15 13:49 UTC (permalink / raw)
  To: Eric DeCosta via GitGitGadget
  Cc: git, Eric Sunshine [ ],
	Ævar Arnfjörð Bjarmason [ ], Glen Choo [ ],
	Johannes Schindelin [ ], Taylor Blau [ ], marzi, Eric DeCosta
In-Reply-To: <d26de10866662a5bcd16d562cd1063dedd21cf02.1707992978.git.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9894 bytes --]

On Thu, Feb 15, 2024 at 10:29:33AM +0000, Eric DeCosta via GitGitGadget wrote:
> From: Eric DeCosta <edecosta@mathworks.com>
> 
> Compare the given path to the mounted filesystems. Find the mount that is
> the longest prefix of the path (if any) and determine if that mount is on a
> local or remote filesystem.

It would be nice to motivate this change in the commit message. Right
now it only explains what the commit does, but it does not mention at
all why that would be a good idea in the first place. Explaining that
this is part of the interface that the existing fsmonitor infrastructure
expects to exist would help.

> Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
> ---
>  Makefile                                |   4 +
>  compat/fsmonitor/fsm-path-utils-linux.c | 195 ++++++++++++++++++++++++
>  compat/fsmonitor/fsm-path-utils-linux.h |  91 +++++++++++
>  config.mak.uname                        |  11 ++
>  4 files changed, 301 insertions(+)
>  create mode 100644 compat/fsmonitor/fsm-path-utils-linux.c
>  create mode 100644 compat/fsmonitor/fsm-path-utils-linux.h
> 
> diff --git a/Makefile b/Makefile
> index 78e874099d9..0f36a0fd83a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2088,6 +2088,10 @@ ifdef HAVE_CLOCK_GETTIME
>  	BASIC_CFLAGS += -DHAVE_CLOCK_GETTIME
>  endif
>  
> +ifdef HAVE_LINUX_MAGIC_H
> +	BASIC_CFLAGS += -DHAVE_LINUX_MAGIC_H
> +endif
> +
>  ifdef HAVE_CLOCK_MONOTONIC
>  	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
>  endif
> diff --git a/compat/fsmonitor/fsm-path-utils-linux.c b/compat/fsmonitor/fsm-path-utils-linux.c
> new file mode 100644
> index 00000000000..c21d1349532
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-path-utils-linux.c
> @@ -0,0 +1,195 @@
> +#include "git-compat-util.h"
> +#include "abspath.h"
> +#include "fsmonitor.h"
> +#include "fsmonitor-path-utils.h"
> +#include "fsm-path-utils-linux.h"
> +#include <errno.h>
> +#include <mntent.h>
> +#include <sys/mount.h>
> +#include <sys/vfs.h>
> +#include <sys/statvfs.h>
> +
> +static int is_remote_fs(const char *path)
> +{
> +	struct statfs fs;
> +
> +	if (statfs(path, &fs))
> +		return error_errno(_("statfs('%s') failed"), path);
> +
> +	switch (fs.f_type) {
> +	case ACFS_SUPER_MAGIC:
> +	case AFS_SUPER_MAGIC:
> +	case CEPH_SUPER_MAGIC:
> +	case CIFS_SUPER_MAGIC:
> +	case CODA_SUPER_MAGIC:
> +	case FHGFS_SUPER_MAGIC:
> +	case GFS_SUPER_MAGIC:
> +	case GPFS_SUPER_MAGIC:
> +	case IBRIX_SUPER_MAGIC:
> +	case KAFS_SUPER_MAGIC:
> +	case LUSTRE_SUPER_MAGIC:
> +	case NCP_SUPER_MAGIC:
> +	case NFS_SUPER_MAGIC:
> +	case NFSD_SUPER_MAGIC:
> +	case OCFS2_SUPER_MAGIC:
> +	case PANFS_SUPER_MAGIC:
> +	case SMB_SUPER_MAGIC:
> +	case SMB2_SUPER_MAGIC:
> +	case SNFS_SUPER_MAGIC:
> +	case VMHGFS_SUPER_MAGIC:
> +	case VXFS_SUPER_MAGIC:
> +		return 1;
> +	default:
> +		return 0;
> +	}
> +}

This list doesn't feel all that maintainable to me, but so be it if
there is no better interface available.

> +static int find_mount(const char *path, const struct statvfs *fs,
> +			struct mntent *entry)

I don't quite understand why `find_mount()` is required in the first
place. Why can't we statfs(2) the path directly? This syscall provides
the fsid and should be sufficient for us to fill in `struct fs_info`.

Explaining details like this in the commit message would help guide the
reader's expectations.

Patrick

> +{
> +	const char *const mounts = "/proc/mounts";
> +	char *rp = real_pathdup(path, 1);
> +	struct mntent *ment = NULL;
> +	struct statvfs mntfs;
> +	FILE *fp;
> +	int found = 0;
> +	int ret = 0;
> +	size_t dlen, plen, flen = 0;
> +
> +	entry->mnt_fsname = NULL;
> +	entry->mnt_dir = NULL;
> +	entry->mnt_type = NULL;
> +
> +	fp = setmntent(mounts, "r");
> 
> +	if (!fp) {
> +		free(rp);
> +		return error_errno(_("setmntent('%s') failed"), mounts);
> +	}
> +
> +	plen = strlen(rp);
> +
> +	/* read all the mount information and compare to path */
> +	while ((ment = getmntent(fp))) {
> +		if (statvfs(ment->mnt_dir, &mntfs)) {
> +			switch (errno) {
> +			case EPERM:
> +			case ESRCH:
> +			case EACCES:
> +				continue;
> +			default:
> +				error_errno(_("statvfs('%s') failed"), ment->mnt_dir);
> +				ret = -1;
> +				goto done;
> +			}
> +		}
> +
> +		/* is mount on the same filesystem and is a prefix of the path */
> +		if ((fs->f_fsid == mntfs.f_fsid) &&
> +			!strncmp(ment->mnt_dir, rp, strlen(ment->mnt_dir))) {
> +			dlen = strlen(ment->mnt_dir);
> +			if (dlen > plen)
> +				continue;
> +			/*
> +			 * look for the longest prefix (including root)
> +			 */
> +			if (dlen > flen &&
> +				((dlen == 1 && ment->mnt_dir[0] == '/') ||
> +				 (!rp[dlen] || rp[dlen] == '/'))) {
> +				flen = dlen;
> +				found = 1;
> +
> +				/*
> +				 * https://man7.org/linux/man-pages/man3/getmntent.3.html
> +				 *
> +				 * The pointer points to a static area of memory which is
> +				 * overwritten by subsequent calls to getmntent().
> +				 */
> +				free(entry->mnt_fsname);
> +				free(entry->mnt_dir);
> +				free(entry->mnt_type);
> +				entry->mnt_fsname = xstrdup(ment->mnt_fsname);
> +				entry->mnt_dir = xstrdup(ment->mnt_dir);
> +				entry->mnt_type = xstrdup(ment->mnt_type);
> +			}
> +		}
> +	}
> +
> +done:
> +	free(rp);
> +	endmntent(fp);
> +
> +	if (!found)
> +		return -1;
> +
> +	return ret;
> +}
> +
> +int fsmonitor__get_fs_info(const char *path, struct fs_info *fs_info)
> +{
> +	int ret = 0;
> +	struct mntent entry;
> +	struct statvfs fs;
> +
> +	fs_info->is_remote = -1;
> +	fs_info->typename = NULL;
> +
> +	if (statvfs(path, &fs))
> +		return error_errno(_("statvfs('%s') failed"), path);
> +
> +	if (find_mount(path, &fs, &entry) < 0) {
> +		ret = -1;
> +		goto done;
> +	}
> +
> +	trace_printf_key(&trace_fsmonitor,
> +			 "statvfs('%s') [flags 0x%08lx] '%s' '%s'",
> +			 path, fs.f_flag, entry.mnt_type, entry.mnt_fsname);
> +
> +	fs_info->is_remote = is_remote_fs(entry.mnt_dir);
> +	fs_info->typename = xstrdup(entry.mnt_fsname);
> +
> +	if (fs_info->is_remote < 0)
> +		ret = -1;
> +
> +	trace_printf_key(&trace_fsmonitor,
> +				"'%s' is_remote: %d",
> +				path, fs_info->is_remote);
> +
> +done:
> +	free(entry.mnt_fsname);
> +	free(entry.mnt_dir);
> +	free(entry.mnt_type);
> +	return ret;
> +}
> +
> +int fsmonitor__is_fs_remote(const char *path)
> +{
> +	int ret = 0;
> +	struct fs_info fs;
> +
> +	if (fsmonitor__get_fs_info(path, &fs))
> +		ret = -1;
> +	else
> +		ret = fs.is_remote;
> +
> +	free(fs.typename);
> +
> +	return ret;
> +}
> +
> +/*
> + * No-op for now.
> + */
> +int fsmonitor__get_alias(const char *path, struct alias_info *info)
> +{
> +	return 0;
> +}
> +
> +/*
> + * No-op for now.
> + */
> +char *fsmonitor__resolve_alias(const char *path,
> +		const struct alias_info *info)
> +{
> +	return NULL;
> +}
> diff --git a/compat/fsmonitor/fsm-path-utils-linux.h b/compat/fsmonitor/fsm-path-utils-linux.h
> new file mode 100644
> index 00000000000..49bdb3c4728
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-path-utils-linux.h
> @@ -0,0 +1,91 @@
> +#ifndef FSM_PATH_UTILS_LINUX_H
> +#define FSM_PATH_UTILS_LINUX_H
> +#endif
> +
> +#ifdef HAVE_LINUX_MAGIC_H
> +#include <linux/magic.h>
> +#endif
> +
> +#ifndef ACFS_SUPER_MAGIC
> +#define ACFS_SUPER_MAGIC 0x61636673
> +#endif
> +
> +#ifndef AFS_SUPER_MAGIC
> +#define AFS_SUPER_MAGIC 0x5346414f
> +#endif
> +
> +#ifndef CEPH_SUPER_MAGIC
> +#define CEPH_SUPER_MAGIC 0x00c36400
> +#endif
> +
> +#ifndef CIFS_SUPER_MAGIC
> +#define CIFS_SUPER_MAGIC 0xff534d42
> +#endif
> +
> +#ifndef CODA_SUPER_MAGIC
> +#define CODA_SUPER_MAGIC 0x73757245
> +#endif
> +
> +#ifndef FHGFS_SUPER_MAGIC
> +#define FHGFS_SUPER_MAGIC 0x19830326
> +#endif
> +
> +#ifndef GFS_SUPER_MAGIC
> +#define GFS_SUPER_MAGIC 0x1161970
> +#endif
> +
> +#ifndef GPFS_SUPER_MAGIC
> +#define GPFS_SUPER_MAGIC 0x47504653
> +#endif
> +
> +#ifndef IBRIX_SUPER_MAGIC
> +#define IBRIX_SUPER_MAGIC 0x013111a8
> +#endif
> +
> +#ifndef KAFS_SUPER_MAGIC
> +#define KAFS_SUPER_MAGIC 0x6b414653
> +#endif
> +
> +#ifndef LUSTRE_SUPER_MAGIC
> +#define LUSTRE_SUPER_MAGIC 0x0bd00bd0
> +#endif
> +
> +#ifndef NCP_SUPER_MAGIC
> +#define NCP_SUPER_MAGIC 0x564c
> +#endif
> +
> +#ifndef NFS_SUPER_MAGIC
> +#define NFS_SUPER_MAGIC 0x6969
> +#endif
> +
> +#ifndef NFSD_SUPER_MAGIC
> +#define NFSD_SUPER_MAGIC 0x6e667364
> +#endif
> +
> +#ifndef OCFS2_SUPER_MAGIC
> +#define OCFS2_SUPER_MAGIC 0x7461636f
> +#endif
> +
> +#ifndef PANFS_SUPER_MAGIC
> +#define PANFS_SUPER_MAGIC 0xaad7aaea
> +#endif
> +
> +#ifndef SMB_SUPER_MAGIC
> +#define SMB_SUPER_MAGIC 0x517b
> +#endif
> +
> +#ifndef SMB2_SUPER_MAGIC
> +#define SMB2_SUPER_MAGIC 0xfe534d42
> +#endif
> +
> +#ifndef SNFS_SUPER_MAGIC
> +#define SNFS_SUPER_MAGIC 0xbeefdead
> +#endif
> +
> +#ifndef VMHGFS_SUPER_MAGIC
> +#define VMHGFS_SUPER_MAGIC 0xbacbacbc
> +#endif
> +
> +#ifndef VXFS_SUPER_MAGIC
> +#define VXFS_SUPER_MAGIC 0xa501fcf5
> +#endif
> diff --git a/config.mak.uname b/config.mak.uname
> index dacc95172dc..80d7e2a2e68 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -68,6 +68,17 @@ ifeq ($(uname_S),Linux)
>  	ifneq ($(findstring .el7.,$(uname_R)),)
>  		BASIC_CFLAGS += -std=c99
>  	endif
> +	ifeq ($(shell test -f /usr/include/linux/magic.h && echo y),y)
> +		HAVE_LINUX_MAGIC_H = YesPlease
> +	endif
> +	# The builtin FSMonitor on Linux builds upon Simple-IPC.  Both require
> +	# Unix domain sockets and PThreads.
> +	ifndef NO_PTHREADS
> +	ifndef NO_UNIX_SOCKETS
> +	FSMONITOR_DAEMON_BACKEND = linux
> +	FSMONITOR_OS_SETTINGS = linux
> +	endif
> +	endif
>  endif
>  ifeq ($(uname_S),GNU/kFreeBSD)
>  	HAVE_ALLOCA_H = YesPlease
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 1/7] fsmonitor: rebase with master
From: Patrick Steinhardt @ 2024-02-15 13:49 UTC (permalink / raw)
  To: Eric DeCosta via GitGitGadget
  Cc: git, Eric Sunshine [ ],
	Ævar Arnfjörð Bjarmason [ ], Glen Choo [ ],
	Johannes Schindelin [ ], Taylor Blau [ ], marzi, Eric DeCosta
In-Reply-To: <5973bbe18aeecf486d8256cc402285665c45e66a.1707992978.git.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 11745 bytes --]

On Thu, Feb 15, 2024 at 10:29:32AM +0000, Eric DeCosta via GitGitGadget wrote:
> From: Eric DeCosta <edecosta@mathworks.com>
> 
> rebased with master, and resolved conflicts

It would be a lot more useful if you adopted the original phrasing of
the commit:

```
fsmonitor: prepare to share code between Mac OS and Linux 

Linux and Mac OS can share some of the code originally developed for Mac OS.

Mac OS and Linux can share fsm-ipc-unix.c and fsm-settings-unix.c

Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
```

Depending on whether or not you have made significant changes during the
rebase I'd also convert the trailers to:

Patch-originally-by: Eric DeCoste <edecosta@mathworks.com>
Signed-off-by: Marzieh Esipreh <m.ispare63@gmail.com>

Furthermore, I think it would be useful if this commit was split up even
further than it already is. Having a preparatory patch that moves around
shareable code is a different topic than introducing the code skeleton
for Linux support, and as far as I can see the shared code does not end
up requiring anything from the new "*-linux.c" files.

Patrick

> Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
> ---
>  compat/fsmonitor/fsm-health-linux.c    | 24 ++++++++++
>  compat/fsmonitor/fsm-ipc-darwin.c      | 57 +----------------------
>  compat/fsmonitor/fsm-ipc-linux.c       |  1 +
>  compat/fsmonitor/fsm-ipc-unix.c        | 53 +++++++++++++++++++++
>  compat/fsmonitor/fsm-settings-darwin.c | 64 +-------------------------
>  compat/fsmonitor/fsm-settings-linux.c  |  1 +
>  compat/fsmonitor/fsm-settings-unix.c   | 61 ++++++++++++++++++++++++
>  7 files changed, 142 insertions(+), 119 deletions(-)
>  create mode 100644 compat/fsmonitor/fsm-health-linux.c
>  create mode 100644 compat/fsmonitor/fsm-ipc-linux.c
>  create mode 100644 compat/fsmonitor/fsm-ipc-unix.c
>  create mode 100644 compat/fsmonitor/fsm-settings-linux.c
>  create mode 100644 compat/fsmonitor/fsm-settings-unix.c
> 
> diff --git a/compat/fsmonitor/fsm-health-linux.c b/compat/fsmonitor/fsm-health-linux.c
> new file mode 100644
> index 00000000000..b9f709e8548
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-health-linux.c
> @@ -0,0 +1,24 @@
> +#include "cache.h"
> +#include "config.h"
> +#include "fsmonitor.h"
> +#include "fsm-health.h"
> +#include "fsmonitor--daemon.h"
> +
> +int fsm_health__ctor(struct fsmonitor_daemon_state *state)
> +{
> +	return 0;
> +}
> +
> +void fsm_health__dtor(struct fsmonitor_daemon_state *state)
> +{
> +	return;
> +}
> +
> +void fsm_health__loop(struct fsmonitor_daemon_state *state)
> +{
> +	return;
> +}
> +
> +void fsm_health__stop_async(struct fsmonitor_daemon_state *state)
> +{
> +}
> diff --git a/compat/fsmonitor/fsm-ipc-darwin.c b/compat/fsmonitor/fsm-ipc-darwin.c
> index 6f3a95410cc..4c3c92081ee 100644
> --- a/compat/fsmonitor/fsm-ipc-darwin.c
> +++ b/compat/fsmonitor/fsm-ipc-darwin.c
> @@ -1,56 +1 @@
> -#include "git-compat-util.h"
> -#include "config.h"
> -#include "gettext.h"
> -#include "hex.h"
> -#include "path.h"
> -#include "repository.h"
> -#include "strbuf.h"
> -#include "fsmonitor-ll.h"
> -#include "fsmonitor-ipc.h"
> -#include "fsmonitor-path-utils.h"
> -
> -static GIT_PATH_FUNC(fsmonitor_ipc__get_default_path, "fsmonitor--daemon.ipc")
> -
> -const char *fsmonitor_ipc__get_path(struct repository *r)
> -{
> -	static const char *ipc_path = NULL;
> -	git_SHA_CTX sha1ctx;
> -	char *sock_dir = NULL;
> -	struct strbuf ipc_file = STRBUF_INIT;
> -	unsigned char hash[GIT_MAX_RAWSZ];
> -
> -	if (!r)
> -		BUG("No repository passed into fsmonitor_ipc__get_path");
> -
> -	if (ipc_path)
> -		return ipc_path;
> -
> -
> -	/* By default the socket file is created in the .git directory */
> -	if (fsmonitor__is_fs_remote(r->gitdir) < 1) {
> -		ipc_path = fsmonitor_ipc__get_default_path();
> -		return ipc_path;
> -	}
> -
> -	git_SHA1_Init(&sha1ctx);
> -	git_SHA1_Update(&sha1ctx, r->worktree, strlen(r->worktree));
> -	git_SHA1_Final(hash, &sha1ctx);
> -
> -	repo_config_get_string(r, "fsmonitor.socketdir", &sock_dir);
> -
> -	/* Create the socket file in either socketDir or $HOME */
> -	if (sock_dir && *sock_dir) {
> -		strbuf_addf(&ipc_file, "%s/.git-fsmonitor-%s",
> -					sock_dir, hash_to_hex(hash));
> -	} else {
> -		strbuf_addf(&ipc_file, "~/.git-fsmonitor-%s", hash_to_hex(hash));
> -	}
> -	free(sock_dir);
> -
> -	ipc_path = interpolate_path(ipc_file.buf, 1);
> -	if (!ipc_path)
> -		die(_("Invalid path: %s"), ipc_file.buf);
> -
> -	strbuf_release(&ipc_file);
> -	return ipc_path;
> -}
> +#include "fsm-ipc-unix.c"
> diff --git a/compat/fsmonitor/fsm-ipc-linux.c b/compat/fsmonitor/fsm-ipc-linux.c
> new file mode 100644
> index 00000000000..4c3c92081ee
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-ipc-linux.c
> @@ -0,0 +1 @@
> +#include "fsm-ipc-unix.c"
> diff --git a/compat/fsmonitor/fsm-ipc-unix.c b/compat/fsmonitor/fsm-ipc-unix.c
> new file mode 100644
> index 00000000000..eb25123fa12
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-ipc-unix.c
> @@ -0,0 +1,53 @@
> +#include "cache.h"
> +#include "config.h"
> +#include "hex.h"
> +#include "strbuf.h"
> +#include "fsmonitor.h"
> +#include "fsmonitor-ipc.h"
> +#include "fsmonitor-path-utils.h"
> +
> +static GIT_PATH_FUNC(fsmonitor_ipc__get_default_path, "fsmonitor--daemon.ipc")
> +
> +const char *fsmonitor_ipc__get_path(struct repository *r)
> +{
> +	static const char *ipc_path = NULL;
> +	git_SHA_CTX sha1ctx;
> +	char *sock_dir = NULL;
> +	struct strbuf ipc_file = STRBUF_INIT;
> +	unsigned char hash[GIT_MAX_RAWSZ];
> +
> +	if (!r)
> +		BUG("No repository passed into fsmonitor_ipc__get_path");
> +
> +	if (ipc_path)
> +		return ipc_path;
> +
> +
> +	/* By default the socket file is created in the .git directory */
> +	if (fsmonitor__is_fs_remote(r->gitdir) < 1) {
> +		ipc_path = fsmonitor_ipc__get_default_path();
> +		return ipc_path;
> +	}
> +
> +	git_SHA1_Init(&sha1ctx);
> +	git_SHA1_Update(&sha1ctx, r->worktree, strlen(r->worktree));
> +	git_SHA1_Final(hash, &sha1ctx);
> +
> +	repo_config_get_string(r, "fsmonitor.socketdir", &sock_dir);
> +
> +	/* Create the socket file in either socketDir or $HOME */
> +	if (sock_dir && *sock_dir) {
> +		strbuf_addf(&ipc_file, "%s/.git-fsmonitor-%s",
> +					sock_dir, hash_to_hex(hash));
> +	} else {
> +		strbuf_addf(&ipc_file, "~/.git-fsmonitor-%s", hash_to_hex(hash));
> +	}
> +	free(sock_dir);
> +
> +	ipc_path = interpolate_path(ipc_file.buf, 1);
> +	if (!ipc_path)
> +		die(_("Invalid path: %s"), ipc_file.buf);
> +
> +	strbuf_release(&ipc_file);
> +	return ipc_path;
> +}
> diff --git a/compat/fsmonitor/fsm-settings-darwin.c b/compat/fsmonitor/fsm-settings-darwin.c
> index a3825906351..14baf9f0603 100644
> --- a/compat/fsmonitor/fsm-settings-darwin.c
> +++ b/compat/fsmonitor/fsm-settings-darwin.c
> @@ -1,63 +1 @@
> -#include "git-compat-util.h"
> -#include "config.h"
> -#include "fsmonitor-ll.h"
> -#include "fsmonitor-ipc.h"
> -#include "fsmonitor-settings.h"
> -#include "fsmonitor-path-utils.h"
> -
> - /*
> - * For the builtin FSMonitor, we create the Unix domain socket for the
> - * IPC in the .git directory.  If the working directory is remote,
> - * then the socket will be created on the remote file system.  This
> - * can fail if the remote file system does not support UDS file types
> - * (e.g. smbfs to a Windows server) or if the remote kernel does not
> - * allow a non-local process to bind() the socket.  (These problems
> - * could be fixed by moving the UDS out of the .git directory and to a
> - * well-known local directory on the client machine, but care should
> - * be taken to ensure that $HOME is actually local and not a managed
> - * file share.)
> - *
> - * FAT32 and NTFS working directories are problematic too.
> - *
> - * The builtin FSMonitor uses a Unix domain socket in the .git
> - * directory for IPC.  These Windows drive formats do not support
> - * Unix domain sockets, so mark them as incompatible for the daemon.
> - *
> - */
> -static enum fsmonitor_reason check_uds_volume(struct repository *r)
> -{
> -	struct fs_info fs;
> -	const char *ipc_path = fsmonitor_ipc__get_path(r);
> -	struct strbuf path = STRBUF_INIT;
> -	strbuf_add(&path, ipc_path, strlen(ipc_path));
> -
> -	if (fsmonitor__get_fs_info(dirname(path.buf), &fs) == -1) {
> -		strbuf_release(&path);
> -		return FSMONITOR_REASON_ERROR;
> -	}
> -
> -	strbuf_release(&path);
> -
> -	if (fs.is_remote ||
> -		!strcmp(fs.typename, "msdos") ||
> -		!strcmp(fs.typename, "ntfs")) {
> -		free(fs.typename);
> -		return FSMONITOR_REASON_NOSOCKETS;
> -	}
> -
> -	free(fs.typename);
> -	return FSMONITOR_REASON_OK;
> -}
> -
> -enum fsmonitor_reason fsm_os__incompatible(struct repository *r, int ipc)
> -{
> -	enum fsmonitor_reason reason;
> -
> -	if (ipc) {
> -		reason = check_uds_volume(r);
> -		if (reason != FSMONITOR_REASON_OK)
> -			return reason;
> -	}
> -
> -	return FSMONITOR_REASON_OK;
> -}
> +#include "fsm-settings-unix.c"
> diff --git a/compat/fsmonitor/fsm-settings-linux.c b/compat/fsmonitor/fsm-settings-linux.c
> new file mode 100644
> index 00000000000..14baf9f0603
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-settings-linux.c
> @@ -0,0 +1 @@
> +#include "fsm-settings-unix.c"
> diff --git a/compat/fsmonitor/fsm-settings-unix.c b/compat/fsmonitor/fsm-settings-unix.c
> new file mode 100644
> index 00000000000..d16dca89416
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-settings-unix.c
> @@ -0,0 +1,61 @@
> +#include "fsmonitor.h"
> +#include "fsmonitor-ipc.h"
> +#include "fsmonitor-path-utils.h"
> +
> + /*
> + * For the builtin FSMonitor, we create the Unix domain socket for the
> + * IPC in the .git directory.  If the working directory is remote,
> + * then the socket will be created on the remote file system.  This
> + * can fail if the remote file system does not support UDS file types
> + * (e.g. smbfs to a Windows server) or if the remote kernel does not
> + * allow a non-local process to bind() the socket.  (These problems
> + * could be fixed by moving the UDS out of the .git directory and to a
> + * well-known local directory on the client machine, but care should
> + * be taken to ensure that $HOME is actually local and not a managed
> + * file share.)
> + *
> + * FAT32 and NTFS working directories are problematic too.
> + *
> + * The builtin FSMonitor uses a Unix domain socket in the .git
> + * directory for IPC.  These Windows drive formats do not support
> + * Unix domain sockets, so mark them as incompatible for the daemon.
> + *
> + */
> +static enum fsmonitor_reason check_uds_volume(struct repository *r)
> +{
> +	struct fs_info fs;
> +	const char *ipc_path = fsmonitor_ipc__get_path(r);
> +	struct strbuf path = STRBUF_INIT;
> +	strbuf_addstr(&path, ipc_path);
> +
> +	if (fsmonitor__get_fs_info(dirname(path.buf), &fs) == -1) {
> +		free(fs.typename);
> +		strbuf_release(&path);
> +		return FSMONITOR_REASON_ERROR;
> +	}
> +
> +	strbuf_release(&path);
> +
> +	if (fs.is_remote ||
> +		!strcmp(fs.typename, "msdos") ||
> +		!strcmp(fs.typename, "ntfs")) {
> +		free(fs.typename);
> +		return FSMONITOR_REASON_NOSOCKETS;
> +	}
> +
> +	free(fs.typename);
> +	return FSMONITOR_REASON_OK;
> +}
> +
> +enum fsmonitor_reason fsm_os__incompatible(struct repository *r, int ipc)
> +{
> +	enum fsmonitor_reason reason;
> +
> +	if (ipc) {
> +		reason = check_uds_volume(r);
> +		if (reason != FSMONITOR_REASON_OK)
> +			return reason;
> +	}
> +
> +	return FSMONITOR_REASON_OK;
> +}
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* AW: git-archive: --add-virtual-file doesn't seem to respect --prefix
From: stefan.naewe @ 2024-02-15 11:48 UTC (permalink / raw)
  To: rmy, git
In-Reply-To: <65cdaa23.MHIIVADqwryAD0ON%rmy@frippery.org>

Classification - ISMS: Offen | VS: OFFEN

> -----Ursprüngliche Nachricht-----
> Von: Ron Yorston <rmy@frippery.org>
> Gesendet: Donnerstag, 15. Februar 2024 07:08
> An: git@vger.kernel.org
> Betreff: Bug: git-archive: --add-virtual-file doesn't seem to respect --prefix
> 
> The man page for git-archive has similar language regarding the --add-file and --
> add-virtual-file options:
> 
>    The path of the file in the archive is built by concatenating the
>    value of the last --prefix option (if any) before...
> 
> However this doesn't seem to be true for --add-virtual-file.  In any git repository:
> 
>    $ touch real_added_file
>    $ git archive --format=tar --prefix=prefix/ --add-file=real_added_file \
>       --add-virtual-file=virtual_added_file: HEAD | \
>       tar tvf - | grep added_file
>    -rw-rw-r-- root/root         0 2017-02-22 17:18 prefix/real_added_file
>    -rw-rw-r-- root/root         0 2017-02-22 17:18 virtual_added_file
> 
> I expected to see 'prefix/virtual_added_file'.
> 
> Ron

Maybe something like this does the job:


diff --git a/archive.c b/archive.c
index 941495f5d7..b6b885a632 100644
--- a/archive.c
+++ b/archive.c
@@ -331,11 +331,11 @@ int write_archive_entries(struct archiver_args *args,

                put_be64(fake_oid.hash, i + 1);

+               strbuf_reset(&path_in_archive);
+               if (info->base)
+                       strbuf_addstr(&path_in_archive, info->base);
+               strbuf_addstr(&path_in_archive, basename(path));
                if (!info->content) {
-                       strbuf_reset(&path_in_archive);
-                       if (info->base)
-                               strbuf_addstr(&path_in_archive, info->base);
-                       strbuf_addstr(&path_in_archive, basename(path));

                        strbuf_reset(&content);
                        if (strbuf_read_file(&content, path, info->stat.st_size) < 0)
@@ -347,7 +347,7 @@ int write_archive_entries(struct archiver_args *args,
                                                  content.buf, content.len);
                } else {
                        err = write_entry(args, &fake_oid,
-                                         path, strlen(path),
+                                         path_in_archive.buf, path_in_archive.len,
                                          canon_mode(info->stat.st_mode),
                                          info->content, info->stat.st_size);
                }


Stefan

^ permalink raw reply related

* Re: [PATCH v2 00/30] initial support for multiple hash functions
From: Patrick Steinhardt @ 2024-02-15 11:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <878r8l929e.fsf@gmail.froward.int.ebiederm.org>

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On Sun, Oct 01, 2023 at 09:39:09PM -0500, Eric W. Biederman wrote:
> 
> This addresses all of the known test failures from v1 of this set of
> changes.  In particular I have reworked commit_tree_extended which
> was flagged by smatch, -Werror=array-bounds, and the leak detector.
> 
> One functional bug was fixed in repo_for_each_abbrev where it was
> mistakenly displaying too many ambiguous oids.
> 
> I am posting this so that people review and testing of this patchset
> won't be distracted by the known and fixed issues.

Thanks! I've reviewed this patch series up to patch 7.

I think the most important question mark I currently have is scalability
of the proposed object mapping format. The complexity to load the object
mappings is currently O(nlogn) and requires reading a file that is
easily hundreds of megabytes or even gigabytes in size.

I have a feeling that this needs to be addressed before such an object
mapping would be feasible for production use, or otherwise it would
incur too high a cost to be useful. I'm afraid that this will make the
whole series more complex to implement -- I'm sorry about that.

I've added comments and ideas regarding this issue on patch 6 and 7. It
could totally be that I'm missing the obvious though, or that my ideas
suck. Please don't hesitate to point that out if that is the case.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/7] fsmonitor: determine if filesystem is local or remote
From: Jean-Noël Avila @ 2024-02-15 11:24 UTC (permalink / raw)
  To: Eric DeCosta via GitGitGadget, git
  Cc: Eric Sunshine [ ], Ævar Arnfjörð Bjarmason [ ],
	Glen Choo [ ], Johannes Schindelin [ ], Taylor Blau [ ], marzi,
	Eric DeCosta
In-Reply-To: <d26de10866662a5bcd16d562cd1063dedd21cf02.1707992978.git.gitgitgadget@gmail.com>

Hello,

Le 15/02/2024 à 11:29, Eric DeCosta via GitGitGadget a écrit :
> From: Eric DeCosta <edecosta@mathworks.com>
> 
> Compare the given path to the mounted filesystems. Find the mount that is
> the longest prefix of the path (if any) and determine if that mount is on a
> local or remote filesystem.
> 
> Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
> ---
>   Makefile                                |   4 +
>   compat/fsmonitor/fsm-path-utils-linux.c | 195 ++++++++++++++++++++++++
>   compat/fsmonitor/fsm-path-utils-linux.h |  91 +++++++++++
>   config.mak.uname                        |  11 ++
>   4 files changed, 301 insertions(+)
>   create mode 100644 compat/fsmonitor/fsm-path-utils-linux.c
>   create mode 100644 compat/fsmonitor/fsm-path-utils-linux.h
> 
> diff --git a/Makefile b/Makefile
> index 78e874099d9..0f36a0fd83a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2088,6 +2088,10 @@ ifdef HAVE_CLOCK_GETTIME
>   	BASIC_CFLAGS += -DHAVE_CLOCK_GETTIME
>   endif
>   
> +ifdef HAVE_LINUX_MAGIC_H
> +	BASIC_CFLAGS += -DHAVE_LINUX_MAGIC_H
> +endif
> +
>   ifdef HAVE_CLOCK_MONOTONIC
>   	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
>   endif
> diff --git a/compat/fsmonitor/fsm-path-utils-linux.c b/compat/fsmonitor/fsm-path-utils-linux.c
> new file mode 100644
> index 00000000000..c21d1349532
> --- /dev/null
> +++ b/compat/fsmonitor/fsm-path-utils-linux.c
> @@ -0,0 +1,195 @@
> +#include "git-compat-util.h"
> +#include "abspath.h"
> +#include "fsmonitor.h"
> +#include "fsmonitor-path-utils.h"
> +#include "fsm-path-utils-linux.h"
> +#include <errno.h>
> +#include <mntent.h>
> +#include <sys/mount.h>
> +#include <sys/vfs.h>
> +#include <sys/statvfs.h>
> +
> +static int is_remote_fs(const char *path)
> +{
> +	struct statfs fs;
> +
> +	if (statfs(path, &fs))
> +		return error_errno(_("statfs('%s') failed"), path);

For the sake of simplifying of the work of translators, would it be wise 
to change this to

+	if (statfs(path, &fs))
+		/* TRANSLATORS: %s('%s') is a libc function call */
+		return error_errno(_("%s('%s') failed"), "statfs", +				path);

and generalize this to all other messages?

Thanks,

JN


^ permalink raw reply

* Re: [PATCH v2 07/30] object-file: update the loose object map when writing loose objects
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <20231002024034.2611-7-ebiederm@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 14239 bytes --]

On Sun, Oct 01, 2023 at 09:40:11PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> To implement SHA1 compatibility on SHA256 repositories the loose
> object map needs to be updated whenver a loose object is written.

Not only when loose objects are written, but also when packfiles are
written e.g. when accepting a push via git-receive-pack(1). Basically,
whenever an object gets written into the main object database.

This also brings up another interesting angle: how will this work in the
context of alternate object directories? We have no control over new
objects being written into those, and thus the object mapping that we
have in our satellite repository that uses the alternate would be out of
date.

I think this is another indicator that stacking might be the right way
to go. Like that, the stack of object maps would be the main stack plus
all stack of alternates concatenated. Finding a mapping would then have
to go through all of these maps to find the desired object.

> Updating the loose object map this way allows git to support
> the old hash algorithm in constant time.

As mentioned before, appending objects is constant-time, but the reading
side is unfortunately not. It's probably more something like `O(nlogn)`
because we have to load all objects and add each of the objects into the
map, which I expect to be `O(logn)`. So the reading time isn't even
linear.

Patrick

> The functions write_loose_object, and stream_loose_object are
> the only two functions that write to the loose object store.
> 
> Update stream_loose_object to compute the compatibiilty hash, update
> the loose object, and then call repo_add_loose_object_map to update
> the loose object map.
> 
> Update write_object_file_flags to convert the object into
> it's compatibility encoding, hash the compatibility encoding,
> write the object, and then update the loose object map.
> 
> Update force_object_loose to lookup the hash of the compatibility
> encoding, write the loose object, and then update the loose object
> map.
> 
> Update write_object_file_literally to convert the object into it's
> compatibility hash encoding, hash the compatibility enconding, write
> the object, and then update the loose object map, when the type string
> is a known type.  For objects with an unknown type this results in a
> partially broken repository, as the objects are not mapped.
> 
> The point of write_object_file_literally is to generate a partially
> broken repository for testing.  For testing skipping writing the loose
> object map is much more useful than refusing to write the broken
> object at all.
> 
> Except that the loose objects are updated before the loose object map
> I have not done any analysis to see how robust this scheme is in the
> event of failure.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  object-file.c | 113 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 95 insertions(+), 18 deletions(-)
> 
> diff --git a/object-file.c b/object-file.c
> index 7dc0c4bfbba8..4e55f475b3b4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -43,6 +43,8 @@
>  #include "setup.h"
>  #include "submodule.h"
>  #include "fsck.h"
> +#include "loose.h"
> +#include "object-file-convert.h"
>  
>  /* The maximum size for an object header. */
>  #define MAX_HEADER_LEN 32
> @@ -1952,9 +1954,12 @@ static int start_loose_object_common(struct strbuf *tmp_file,
>  				     const char *filename, unsigned flags,
>  				     git_zstream *stream,
>  				     unsigned char *buf, size_t buflen,
> -				     git_hash_ctx *c,
> +				     git_hash_ctx *c, git_hash_ctx *compat_c,
>  				     char *hdr, int hdrlen)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	int fd;
>  
>  	fd = create_tmpfile(tmp_file, filename);
> @@ -1974,14 +1979,18 @@ static int start_loose_object_common(struct strbuf *tmp_file,
>  	git_deflate_init(stream, zlib_compression_level);
>  	stream->next_out = buf;
>  	stream->avail_out = buflen;
> -	the_hash_algo->init_fn(c);
> +	algo->init_fn(c);
> +	if (compat && compat_c)
> +		compat->init_fn(compat_c);
>  
>  	/*  Start to feed header to zlib stream */
>  	stream->next_in = (unsigned char *)hdr;
>  	stream->avail_in = hdrlen;
>  	while (git_deflate(stream, 0) == Z_OK)
>  		; /* nothing */
> -	the_hash_algo->update_fn(c, hdr, hdrlen);
> +	algo->update_fn(c, hdr, hdrlen);
> +	if (compat && compat_c)
> +		compat->update_fn(compat_c, hdr, hdrlen);
>  
>  	return fd;
>  }
> @@ -1990,16 +1999,21 @@ static int start_loose_object_common(struct strbuf *tmp_file,
>   * Common steps for the inner git_deflate() loop for writing loose
>   * objects. Returns what git_deflate() returns.
>   */
> -static int write_loose_object_common(git_hash_ctx *c,
> +static int write_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
>  				     git_zstream *stream, const int flush,
>  				     unsigned char *in0, const int fd,
>  				     unsigned char *compressed,
>  				     const size_t compressed_len)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	int ret;
>  
>  	ret = git_deflate(stream, flush ? Z_FINISH : 0);
> -	the_hash_algo->update_fn(c, in0, stream->next_in - in0);
> +	algo->update_fn(c, in0, stream->next_in - in0);
> +	if (compat && compat_c)
> +		compat->update_fn(compat_c, in0, stream->next_in - in0);
>  	if (write_in_full(fd, compressed, stream->next_out - compressed) < 0)
>  		die_errno(_("unable to write loose object file"));
>  	stream->next_out = compressed;
> @@ -2014,15 +2028,21 @@ static int write_loose_object_common(git_hash_ctx *c,
>   * - End the compression of zlib stream.
>   * - Get the calculated oid to "oid".
>   */
> -static int end_loose_object_common(git_hash_ctx *c, git_zstream *stream,
> -				   struct object_id *oid)
> +static int end_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
> +				   git_zstream *stream, struct object_id *oid,
> +				   struct object_id *compat_oid)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	int ret;
>  
>  	ret = git_deflate_end_gently(stream);
>  	if (ret != Z_OK)
>  		return ret;
> -	the_hash_algo->final_oid_fn(oid, c);
> +	algo->final_oid_fn(oid, c);
> +	if (compat && compat_c)
> +		compat->final_oid_fn(compat_oid, compat_c);
>  
>  	return Z_OK;
>  }
> @@ -2046,7 +2066,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  
>  	fd = start_loose_object_common(&tmp_file, filename.buf, flags,
>  				       &stream, compressed, sizeof(compressed),
> -				       &c, hdr, hdrlen);
> +				       &c, NULL, hdr, hdrlen);
>  	if (fd < 0)
>  		return -1;
>  
> @@ -2056,14 +2076,14 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  	do {
>  		unsigned char *in0 = stream.next_in;
>  
> -		ret = write_loose_object_common(&c, &stream, 1, in0, fd,
> +		ret = write_loose_object_common(&c, NULL, &stream, 1, in0, fd,
>  						compressed, sizeof(compressed));
>  	} while (ret == Z_OK);
>  
>  	if (ret != Z_STREAM_END)
>  		die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid),
>  		    ret);
> -	ret = end_loose_object_common(&c, &stream, &parano_oid);
> +	ret = end_loose_object_common(&c, NULL, &stream, &parano_oid, NULL);
>  	if (ret != Z_OK)
>  		die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid),
>  		    ret);
> @@ -2108,10 +2128,12 @@ static int freshen_packed_object(const struct object_id *oid)
>  int stream_loose_object(struct input_stream *in_stream, size_t len,
>  			struct object_id *oid)
>  {
> +	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
> +	struct object_id compat_oid;
>  	int fd, ret, err = 0, flush = 0;
>  	unsigned char compressed[4096];
>  	git_zstream stream;
> -	git_hash_ctx c;
> +	git_hash_ctx c, compat_c;
>  	struct strbuf tmp_file = STRBUF_INIT;
>  	struct strbuf filename = STRBUF_INIT;
>  	int dirlen;
> @@ -2135,7 +2157,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	 */
>  	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
>  				       &stream, compressed, sizeof(compressed),
> -				       &c, hdr, hdrlen);
> +				       &c, &compat_c, hdr, hdrlen);
>  	if (fd < 0) {
>  		err = -1;
>  		goto cleanup;
> @@ -2153,7 +2175,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  			if (in_stream->is_finished)
>  				flush = 1;
>  		}
> -		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
> +		ret = write_loose_object_common(&c, &compat_c, &stream, flush, in0, fd,
>  						compressed, sizeof(compressed));
>  		/*
>  		 * Unlike write_loose_object(), we do not have the entire
> @@ -2176,7 +2198,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	 */
>  	if (ret != Z_STREAM_END)
>  		die(_("unable to stream deflate new object (%d)"), ret);
> -	ret = end_loose_object_common(&c, &stream, oid);
> +	ret = end_loose_object_common(&c, &compat_c, &stream, oid, &compat_oid);
>  	if (ret != Z_OK)
>  		die(_("deflateEnd on stream object failed (%d)"), ret);
>  	close_loose_object(fd, tmp_file.buf);
> @@ -2203,6 +2225,8 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	}
>  
>  	err = finalize_object_file(tmp_file.buf, filename.buf);
> +	if (!err && compat)
> +		err = repo_add_loose_object_map(the_repository, oid, &compat_oid);
>  cleanup:
>  	strbuf_release(&tmp_file);
>  	strbuf_release(&filename);
> @@ -2213,17 +2237,38 @@ int write_object_file_flags(const void *buf, unsigned long len,
>  			    enum object_type type, struct object_id *oid,
>  			    unsigned flags)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
> +	struct object_id compat_oid;
>  	char hdr[MAX_HEADER_LEN];
>  	int hdrlen = sizeof(hdr);
>  
> +	/* Generate compat_oid */
> +	if (compat) {
> +		if (type == OBJ_BLOB)
> +			hash_object_file(compat, buf, len, type, &compat_oid);
> +		else {
> +			struct strbuf converted = STRBUF_INIT;
> +			convert_object_file(&converted, algo, compat,
> +					    buf, len, type, 0);
> +			hash_object_file(compat, converted.buf, converted.len,
> +					 type, &compat_oid);
> +			strbuf_release(&converted);
> +		}
> +	}
> +
>  	/* Normally if we have it in the pack then we do not bother writing
>  	 * it out into .git/objects/??/?{38} file.
>  	 */
> -	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
> -				  &hdrlen);
> +	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
>  	if (freshen_packed_object(oid) || freshen_loose_object(oid))
>  		return 0;
> -	return write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags);
> +	if (write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags))
> +		return -1;
> +	if (compat)
> +		return repo_add_loose_object_map(repo, oid, &compat_oid);
> +	return 0;
>  }
>  
>  int write_object_file_literally(const void *buf, unsigned long len,
> @@ -2231,7 +2276,27 @@ int write_object_file_literally(const void *buf, unsigned long len,
>  				unsigned flags)
>  {
>  	char *header;
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
> +	struct object_id compat_oid;
>  	int hdrlen, status = 0;
> +	int compat_type = -1;
> +
> +	if (compat) {
> +		compat_type = type_from_string_gently(type, -1, 1);
> +		if (compat_type == OBJ_BLOB)
> +			hash_object_file(compat, buf, len, compat_type,
> +					 &compat_oid);
> +		else if (compat_type != -1) {
> +			struct strbuf converted = STRBUF_INIT;
> +			convert_object_file(&converted, algo, compat,
> +					    buf, len, compat_type, 0);
> +			hash_object_file(compat, converted.buf, converted.len,
> +					 compat_type, &compat_oid);
> +			strbuf_release(&converted);
> +		}
> +	}
>  
>  	/* type string, SP, %lu of the length plus NUL must fit this */
>  	hdrlen = strlen(type) + MAX_HEADER_LEN;
> @@ -2244,6 +2309,8 @@ int write_object_file_literally(const void *buf, unsigned long len,
>  	if (freshen_packed_object(oid) || freshen_loose_object(oid))
>  		goto cleanup;
>  	status = write_loose_object(oid, header, hdrlen, buf, len, 0, 0);
> +	if (compat_type != -1)
> +		return repo_add_loose_object_map(repo, oid, &compat_oid);
>  
>  cleanup:
>  	free(header);
> @@ -2252,9 +2319,12 @@ int write_object_file_literally(const void *buf, unsigned long len,
>  
>  int force_object_loose(const struct object_id *oid, time_t mtime)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	void *buf;
>  	unsigned long len;
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	struct object_id compat_oid;
>  	enum object_type type;
>  	char hdr[MAX_HEADER_LEN];
>  	int hdrlen;
> @@ -2267,8 +2337,15 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
>  	oi.contentp = &buf;
>  	if (oid_object_info_extended(the_repository, oid, &oi, 0))
>  		return error(_("cannot read object for %s"), oid_to_hex(oid));
> +	if (compat) {
> +		if (repo_oid_to_algop(repo, oid, compat, &compat_oid))
> +			return error(_("cannot map object %s to %s"),
> +				     oid_to_hex(oid), compat->name);
> +	}
>  	hdrlen = format_object_header(hdr, sizeof(hdr), type, len);
>  	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
> +	if (!ret && compat)
> +		ret = repo_add_loose_object_map(the_repository, oid, &compat_oid);
>  	free(buf);
>  
>  	return ret;
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 06/30] loose: compatibilty short name support
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <20231002024034.2611-6-ebiederm@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4497 bytes --]

On Sun, Oct 01, 2023 at 09:40:10PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Update loose_objects_cache when udpating the loose objects map.  This
> oidtree is used to discover which oids are possibilities when
> resolving short names, and it can support a mixture of sha1
> and sha256 oids.
> 
> With this any oid recorded objects/loose-objects-idx is usable
> for resolving an oid to an object.
> 
> To make this maintainable a helper insert_loose_map is factored
> out of load_one_loose_object_map and repo_add_loose_object_map,
> and then modified to also update the loose_objects_cache.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  loose.c | 37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/loose.c b/loose.c
> index 6ba73cc84dca..f6faa6216a08 100644
> --- a/loose.c
> +++ b/loose.c
> @@ -7,6 +7,7 @@
>  #include "gettext.h"
>  #include "loose.h"
>  #include "lockfile.h"
> +#include "oidtree.h"
>  
>  static const char *loose_object_header = "# loose-object-idx\n";
>  
> @@ -42,6 +43,21 @@ static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const
>  	return 1;
>  }
>  
> +static int insert_loose_map(struct object_directory *odb,
> +			    const struct object_id *oid,
> +			    const struct object_id *compat_oid)

I think it would've been nice to fold this into the preceding patch
already. At least I wanted to propose adding such a function to avoid
the duplication down below.

Patrick

> +{
> +	struct loose_object_map *map = odb->loose_map;
> +	int inserted = 0;
> +
> +	inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
> +	inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
> +	if (inserted)
> +		oidtree_insert(odb->loose_objects_cache, compat_oid);
> +
> +	return inserted;
> +}
> +
>  static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
>  {
>  	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> @@ -49,15 +65,14 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
>  
>  	if (!dir->loose_map)
>  		loose_object_map_init(&dir->loose_map);
> +	if (!dir->loose_objects_cache) {
> +		ALLOC_ARRAY(dir->loose_objects_cache, 1);
> +		oidtree_init(dir->loose_objects_cache);
> +	}
>  
> -	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
> -	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
> -
> -	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
> -	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
> -
> -	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
> -	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
> +	insert_loose_map(dir, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
> +	insert_loose_map(dir, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
> +	insert_loose_map(dir, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
>  
>  	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
>  	fp = fopen(path.buf, "rb");
> @@ -77,8 +92,7 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
>  		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
>  		    p != buf.buf + buf.len)
>  			goto err;
> -		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
> -		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
> +		insert_loose_map(dir, &oid, &compat_oid);
>  	}
>  
>  	strbuf_release(&buf);
> @@ -197,8 +211,7 @@ int repo_add_loose_object_map(struct repository *repo, const struct object_id *o
>  	if (!should_use_loose_object_map(repo))
>  		return 0;
>  
> -	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
> -	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
> +	inserted = insert_loose_map(repo->objects->odb, oid, compat_oid);
>  	if (inserted)
>  		return write_one_object(repo, oid, compat_oid);
>  	return 0;
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <20231002024034.2611-5-ebiederm@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 17174 bytes --]

On Sun, Oct 01, 2023 at 09:40:09PM -0500, Eric W. Biederman wrote:
> From: "brian m. carlson" <sandals@crustytoothpaste.net>
> 
> As part of the transition plan, we'd like to add a file in the .git
> directory that maps loose objects between SHA-1 and SHA-256.  Let's
> implement the specification in the transition plan and store this data
> on a per-repository basis in struct repository.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  Makefile              |   1 +
>  loose.c               | 246 ++++++++++++++++++++++++++++++++++++++++++
>  loose.h               |  22 ++++
>  object-file-convert.c |  14 ++-
>  object-store-ll.h     |   3 +
>  object.c              |   2 +
>  repository.c          |   6 ++
>  7 files changed, 293 insertions(+), 1 deletion(-)
>  create mode 100644 loose.c
>  create mode 100644 loose.h
> 
> diff --git a/Makefile b/Makefile
> index f7e824f25cda..3c18664def9a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o
>  LIB_OBJS += list-objects.o
>  LIB_OBJS += lockfile.o
>  LIB_OBJS += log-tree.o
> +LIB_OBJS += loose.o
>  LIB_OBJS += ls-refs.o
>  LIB_OBJS += mailinfo.o
>  LIB_OBJS += mailmap.o
> diff --git a/loose.c b/loose.c
> new file mode 100644
> index 000000000000..6ba73cc84dca
> --- /dev/null
> +++ b/loose.c

When reading "loose" I immediately think about loose objects, only. I
would not consider this about mapping object IDs, which I expect would
also happen for packed objects?

It very much seems like you explicitly only care about loose objects in
the code here, which is weird to me. If that is in fact intentional
because we learn to store the compat object hash in pack files over the
course of this patch seires then it would make sense to explain this a
bit more in depth.

> @@ -0,0 +1,246 @@
> +#include "git-compat-util.h"
> +#include "hash.h"
> +#include "path.h"
> +#include "object-store.h"
> +#include "hex.h"
> +#include "wrapper.h"
> +#include "gettext.h"
> +#include "loose.h"
> +#include "lockfile.h"
> +
> +static const char *loose_object_header = "# loose-object-idx\n";
> +
> +static inline int should_use_loose_object_map(struct repository *repo)
> +{
> +	return repo->compat_hash_algo && repo->gitdir;
> +}
> +
> +void loose_object_map_init(struct loose_object_map **map)
> +{
> +	struct loose_object_map *m;
> +	m = xmalloc(sizeof(**map));
> +	m->to_compat = kh_init_oid_map();
> +	m->to_storage = kh_init_oid_map();
> +	*map = m;
> +}
> +
> +static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const struct object_id *value)
> +{
> +	khiter_t pos;
> +	int ret;
> +	struct object_id *stored;
> +
> +	pos = kh_put_oid_map(map, *key, &ret);
> +
> +	/* This item already exists in the map. */
> +	if (ret == 0)
> +		return 0;

Should we safeguard this and compare whether the key's value matches the
passed-in value? One of the more general themes that I'm worried about
is what happens when we hit hash collisions (e.g. two objects mapping to
the same SHA1, but different SHA256 hashes), and safeguarding us against
this possibility feels sensible to me.

> +	stored = xmalloc(sizeof(*stored));
> +	oidcpy(stored, value);
> +	kh_value(map, pos) = stored;
> +	return 1;
> +}
> +
> +static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
> +{
> +	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +	FILE *fp;
> +
> +	if (!dir->loose_map)
> +		loose_object_map_init(&dir->loose_map);
> +
> +	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
> +	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
> +
> +	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
> +	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
> +
> +	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
> +	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
> +
> +	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +	fp = fopen(path.buf, "rb");
> +	if (!fp) {
> +		strbuf_release(&path);
> +		return 0;

I think we should discern ENOENT from other errors. Failing gracefully
when the file doesn't exist may be sensible, but not when we failed due
to something like an I/O error.

> +	}
> +
> +	errno = 0;
> +	if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header))
> +		goto err;
> +	while (!strbuf_getline_lf(&buf, fp)) {
> +		const char *p;
> +		struct object_id oid, compat_oid;
> +		if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) ||
> +		    *p++ != ' ' ||
> +		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
> +		    p != buf.buf + buf.len)
> +			goto err;
> +		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
> +		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
> +	}

Is the actual format specified anywhere? I have to wonder about the
scalability of such a format that uses a simple line-based format for
every object ID. Two main concerns:

  1. If the format is unsorted and we simply append to it whenever the
     repo gains new objects then we are forced to always load the
     complete map into memory. This would be quite inefficient in larger
     repositories that have millions of objects. Every line contains two
     object hashes as well as two whitespace characters, which amounts
     to `(2 + 40 + 64) * $numobjects` many bytes.

     For linux.git with more than 10 million objects, the map would thus
     be around 1GB in size. Loading that into memory and converting it
     into maps feels prohibitively expensive to me.

  2. If the format was sorted then we could perform binary searches
     inside the format to look up object IDs because we know that each
     line has a fixed length. On the other hand, adding new objects
     would require us to rewrite the whole file every time.

I think loading the complete object map into memory is simply too
expensive in any larger "real-world" repository. But rewriting a sorted
file format every time we add new objects feels sufficiently expensive,
too. Neither of these properties sounds like it would be feasible to use
for larger Git hosting platforms. So I think we should put some more
thought into this.

Some proposals:

  - We shouldn't store hex characters but raw object IDs, thus reducing
    the size of the file by almost half.

  - We should store the file sorted so that we can avoid loading it into
    memory and do binary searches.

  - We might grow this into a "stack" of object maps so that it becomes
    easier to add new objects to the map without having to rewrite it
    every time. With geometric repacking this should be somewhat
    manageable.

We don't have to do all of this right from the beginning, I just want to
start the discussion around this.

> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return errno ? -1 : 0;

It feels quite fragile to me to check for `errno` in this way. Should we
instead check `ferror(fp)`?

> +err:
> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return -1;
> +}

We could deduplicate the error paths by storing the return value into an
`int ret`.

> +int repo_read_loose_object_map(struct repository *repo)
> +{
> +	struct object_directory *dir;
> +
> +	if (!should_use_loose_object_map(repo))
> +		return 0;
> +
> +	prepare_alt_odb(repo);
> +
> +	for (dir = repo->objects->odb; dir; dir = dir->next) {
> +		if (load_one_loose_object_map(repo, dir) < 0) {
> +			return -1;
> +		}
> +	}

The braces here are not needed.

> +	return 0;
> +}
> +
> +int repo_write_loose_object_map(struct repository *repo)
> +{
> +	kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat;
> +	struct lock_file lock;
> +	int fd;
> +	khiter_t iter;
> +	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +
> +	if (!should_use_loose_object_map(repo))
> +		return 0;
> +
> +	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +	fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
> +	iter = kh_begin(map);
> +	if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
> +		goto errout;
> +
> +	for (; iter != kh_end(map); iter++) {
> +		if (kh_exist(map, iter)) {
> +			if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
> +			    oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
> +				continue;
> +			strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
> +			if (write_in_full(fd, buf.buf, buf.len) < 0)
> +				goto errout;
> +			strbuf_reset(&buf);
> +		}
> +	}
> +	strbuf_release(&buf);
> +	if (commit_lock_file(&lock) < 0) {
> +		error_errno(_("could not write loose object index %s"), path.buf);
> +		strbuf_release(&path);
> +		return -1;
> +	}
> +	strbuf_release(&path);
> +	return 0;
> +errout:
> +	rollback_lock_file(&lock);
> +	strbuf_release(&buf);
> +	error_errno(_("failed to write loose object index %s\n"), path.buf);
> +	strbuf_release(&path);
> +	return -1;

Same here, we should be able to combine cleanup of both the successful
and error paths. It's safe to call `rollback_lock_file()` even if the
file has already been committed.

> +}
> +
> +static int write_one_object(struct repository *repo, const struct object_id *oid,
> +			    const struct object_id *compat_oid)
> +{
> +	struct lock_file lock;
> +	int fd;
> +	struct stat st;
> +	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +
> +	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +	hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
> +
> +	fd = open(path.buf, O_WRONLY | O_CREAT | O_APPEND, 0666);
> +	if (fd < 0)
> +		goto errout;
> +	if (fstat(fd, &st) < 0)
> +		goto errout;
> +	if (!st.st_size && write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
> +		goto errout;
> +
> +	strbuf_addf(&buf, "%s %s\n", oid_to_hex(oid), oid_to_hex(compat_oid));
> +	if (write_in_full(fd, buf.buf, buf.len) < 0)
> +		goto errout;
> +	if (close(fd))
> +		goto errout;

It's not safe to update the file in-place like this. A concurrent reader
may end up seeing partial lines and error out. Also, if we were to crash
we might easily end up with a corrupted mapping file.

> +	adjust_shared_perm(path.buf);
> +	rollback_lock_file(&lock);
> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return 0;
> +errout:
> +	error_errno(_("failed to write loose object index %s\n"), path.buf);
> +	close(fd);
> +	rollback_lock_file(&lock);
> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return -1;

Same.

> +}
> +
> +int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
> +			      const struct object_id *compat_oid)
> +{
> +	int inserted = 0;
> +
> +	if (!should_use_loose_object_map(repo))
> +		return 0;
> +
> +	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
> +	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
> +	if (inserted)
> +		return write_one_object(repo, oid, compat_oid);
> +	return 0;
> +}
> +
> +int repo_loose_object_map_oid(struct repository *repo,
> +			      const struct object_id *src,
> +			      const struct git_hash_algo *to,
> +			      struct object_id *dest)
> +{
> +	struct object_directory *dir;
> +	kh_oid_map_t *map;
> +	khiter_t pos;
> +
> +	for (dir = repo->objects->odb; dir; dir = dir->next) {
> +		struct loose_object_map *loose_map = dir->loose_map;
> +		if (!loose_map)
> +			continue;
> +		map = (to == repo->compat_hash_algo) ?
> +			loose_map->to_compat :
> +			loose_map->to_storage;
> +		pos = kh_get_oid_map(map, *src);
> +		if (pos < kh_end(map)) {
> +			oidcpy(dest, kh_value(map, pos));
> +			return 0;
> +		}
> +	}
> +	return -1;
> +}
> +
> +void loose_object_map_clear(struct loose_object_map **map)

Nit: I'd rather call it `loose_object_map_release()`. `clear` typically
indicates that we clear contents, but do not end up freeing the
containing structure.

> +{
> +	struct loose_object_map *m = *map;
> +	struct object_id *oid;
> +
> +	if (!m)
> +		return;
> +
> +	kh_foreach_value(m->to_compat, oid, free(oid));
> +	kh_foreach_value(m->to_storage, oid, free(oid));
> +	kh_destroy_oid_map(m->to_compat);
> +	kh_destroy_oid_map(m->to_storage);
> +	free(m);
> +	*map = NULL;
> +}
> diff --git a/loose.h b/loose.h
> new file mode 100644
> index 000000000000..2c2957072c5f
> --- /dev/null
> +++ b/loose.h
> @@ -0,0 +1,22 @@
> +#ifndef LOOSE_H
> +#define LOOSE_H
> +
> +#include "khash.h"
> +
> +struct loose_object_map {
> +	kh_oid_map_t *to_compat;
> +	kh_oid_map_t *to_storage;
> +};

Any specific reason why you don't use `struct oidmap` here?

Patrick

> +void loose_object_map_init(struct loose_object_map **map);
> +void loose_object_map_clear(struct loose_object_map **map);
> +int repo_loose_object_map_oid(struct repository *repo,
> +			      const struct object_id *src,
> +			      const struct git_hash_algo *dest_algo,
> +			      struct object_id *dest);
> +int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
> +			      const struct object_id *compat_oid);
> +int repo_read_loose_object_map(struct repository *repo);
> +int repo_write_loose_object_map(struct repository *repo);
> +
> +#endif
> diff --git a/object-file-convert.c b/object-file-convert.c
> index 4777aba83636..1ec945eaa17f 100644
> --- a/object-file-convert.c
> +++ b/object-file-convert.c
> @@ -4,6 +4,7 @@
>  #include "repository.h"
>  #include "hash-ll.h"
>  #include "object.h"
> +#include "loose.h"
>  #include "object-file-convert.h"
>  
>  int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> @@ -21,7 +22,18 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
>  			oidcpy(dest, src);
>  		return 0;
>  	}
> -	return -1;
> +	if (repo_loose_object_map_oid(repo, src, to, dest)) {
> +		/*
> +		 * We may have loaded the object map at repo initialization but
> +		 * another process (perhaps upstream of a pipe from us) may have
> +		 * written a new object into the map.  If the object is missing,
> +		 * let's reload the map to see if the object has appeared.
> +		 */
> +		repo_read_loose_object_map(repo);
> +		if (repo_loose_object_map_oid(repo, src, to, dest))
> +			return -1;
> +	}
> +	return 0;
>  }
>  
>  int convert_object_file(struct strbuf *outbuf,
> diff --git a/object-store-ll.h b/object-store-ll.h
> index 26a3895c821c..bc76d6bec80d 100644
> --- a/object-store-ll.h
> +++ b/object-store-ll.h
> @@ -26,6 +26,9 @@ struct object_directory {
>  	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
>  	struct oidtree *loose_objects_cache;
>  
> +	/* Map between object IDs for loose objects. */
> +	struct loose_object_map *loose_map;
> +
>  	/*
>  	 * This is a temporary object store created by the tmp_objdir
>  	 * facility. Disable ref updates since the objects in the store
> diff --git a/object.c b/object.c
> index 2c61e4c86217..186a0a47c0fb 100644
> --- a/object.c
> +++ b/object.c
> @@ -13,6 +13,7 @@
>  #include "alloc.h"
>  #include "packfile.h"
>  #include "commit-graph.h"
> +#include "loose.h"
>  
>  unsigned int get_max_object_index(void)
>  {
> @@ -540,6 +541,7 @@ void free_object_directory(struct object_directory *odb)
>  {
>  	free(odb->path);
>  	odb_clear_loose_cache(odb);
> +	loose_object_map_clear(&odb->loose_map);
>  	free(odb);
>  }
>  
> diff --git a/repository.c b/repository.c
> index 80252b79e93e..6214f61cf4e7 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -14,6 +14,7 @@
>  #include "read-cache-ll.h"
>  #include "remote.h"
>  #include "setup.h"
> +#include "loose.h"
>  #include "submodule-config.h"
>  #include "sparse-index.h"
>  #include "trace2.h"
> @@ -109,6 +110,8 @@ void repo_set_compat_hash_algo(struct repository *repo, int algo)
>  	if (hash_algo_by_ptr(repo->hash_algo) == algo)
>  		BUG("hash_algo and compat_hash_algo match");
>  	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
> +	if (repo->compat_hash_algo)
> +		repo_read_loose_object_map(repo);
>  }
>  
>  /*
> @@ -201,6 +204,9 @@ int repo_init(struct repository *repo,
>  	if (worktree)
>  		repo_set_worktree(repo, worktree);
>  
> +	if (repo->compat_hash_algo)
> +		repo_read_loose_object_map(repo);
> +
>  	clear_repository_format(&format);
>  	return 0;
>  
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 04/30] repository: add a compatibility hash algorithm
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <20231002024034.2611-4-ebiederm@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4674 bytes --]

On Sun, Oct 01, 2023 at 09:40:08PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> We currently have support for using a full stage 4 SHA-256
> implementation.

What is a "full stage 4 SHA-256 implementation"? I was assuming that you
referred to "Documentation/technical/hash-function-transition.txt", but
it does not mention stages either.

> However, we'd like to support interoperability with
> SHA-1 repositories as well.  The transition plan anticipates a
> compatibility hash algorithm configuration option that we can use to
> implement support for this.  Let's add an element to the repository
> structure that indicates the compatibility hash algorithm so we can use
> it when we need to consider interoperability between algorithms.
> 
> Add a helper function repo_set_compat_hash_algo that takes a
> compatibility hash algorithm and sets "repo->compat_hash_algo".  If
> GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm
> "repo->compat_hash_algo" is set to NULL.
> 
> For now, the code results in "repo->compat_hash_algo" always being set
> to NULL, but that will change once a configuration option is added.
> 
> Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  repository.c | 8 ++++++++
>  repository.h | 4 ++++
>  setup.c      | 3 +++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/repository.c b/repository.c
> index a7679ceeaa45..80252b79e93e 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -104,6 +104,13 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
>  	repo->hash_algo = &hash_algos[hash_algo];
>  }
>  
> +void repo_set_compat_hash_algo(struct repository *repo, int algo)
> +{
> +	if (hash_algo_by_ptr(repo->hash_algo) == algo)
> +		BUG("hash_algo and compat_hash_algo match");
> +	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
> +}
> +
>  /*
>   * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
>   * Return 0 upon success and a non-zero value upon failure.
> @@ -184,6 +191,7 @@ int repo_init(struct repository *repo,
>  		goto error;
>  
>  	repo_set_hash_algo(repo, format.hash_algo);
> +	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
>  	repo->repository_format_worktree_config = format.worktree_config;
>  
>  	/* take ownership of format.partial_clone */
> diff --git a/repository.h b/repository.h
> index 5f18486f6465..bf3fc601cc53 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -160,6 +160,9 @@ struct repository {
>  	/* Repository's current hash algorithm, as serialized on disk. */
>  	const struct git_hash_algo *hash_algo;
>  
> +	/* Repository's compatibility hash algorithm. */
> +	const struct git_hash_algo *compat_hash_algo;
> +
>  	/* A unique-id for tracing purposes. */
>  	int trace2_repo_id;
>  
> @@ -199,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
>  		     const struct set_gitdir_args *extra_args);
>  void repo_set_worktree(struct repository *repo, const char *path);
>  void repo_set_hash_algo(struct repository *repo, int algo);
> +void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
>  void initialize_the_repository(void);
>  RESULT_MUST_BE_USED
>  int repo_init(struct repository *r, const char *gitdir, const char *worktree);
> diff --git a/setup.c b/setup.c
> index 18927a847b86..aa8bf5da5226 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1564,6 +1564,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		}
>  		if (startup_info->have_repository) {
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +			repo_set_compat_hash_algo(the_repository,
> +						  GIT_HASH_UNKNOWN);
>  			the_repository->repository_format_worktree_config =
>  				repo_fmt.worktree_config;
>  			/* take ownership of repo_fmt.partial_clone */
> @@ -1657,6 +1659,7 @@ void check_repository_format(struct repository_format *fmt)
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
>  	repo_set_hash_algo(the_repository, fmt->hash_algo);
> +	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
>  	the_repository->repository_format_worktree_config =
>  		fmt->worktree_config;
>  	the_repository->repository_format_partial_clone =

There's also `init_db()`, where we call `repo_set_hash_algo()`. Would we
have to call `repo_set_compat_hash_algo()` there, too? There are some
other locations when handling remotes or clones, but I don't think those
are relevant right now.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 03/30] object-names: support input of oids in any supported hash
From: Patrick Steinhardt @ 2024-02-15 11:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <20231002024034.2611-3-ebiederm@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10216 bytes --]

On Sun, Oct 01, 2023 at 09:40:07PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Support short oids encoded in any algorithm, while ensuring enough of
> the oid is specified to disambiguate between all of the oids in the
> repository encoded in any algorithm.
> 
> By default have the code continue to only accept oids specified in the
> storage hash algorithm of the repository, but when something is
> ambiguous display all of the possible oids from any accepted oid
> encoding.
> 
> A new flag is added GET_OID_HASH_ANY that when supplied causes the
> code to accept oids specified in any hash algorithm, and to return the
> oids that were resolved.
> 
> This implements the functionality that allows both SHA-1 and SHA-256
> object names, from the "Object names on the command line" section of
> the hash function transition document.
> 
> Care is taken in get_short_oid so that when the result is ambiguous
> the output remains the same if GIT_OID_HASH_ANY was not supplied.  If
> GET_OID_HASH_ANY was supplied objects of any hash algorithm that match
> the prefix are displayed.
> 
> This required updating repo_for_each_abbrev to give it a parameter so
> that it knows to look at all hash algorithms.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  builtin/rev-parse.c |  2 +-
>  hash-ll.h           |  1 +
>  object-name.c       | 46 ++++++++++++++++++++++++++++++++++-----------
>  object-name.h       |  3 ++-
>  4 files changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
> index fde8861ca4e0..43e96765400c 100644
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -882,7 +882,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
>  				continue;
>  			}
>  			if (skip_prefix(arg, "--disambiguate=", &arg)) {
> -				repo_for_each_abbrev(the_repository, arg,
> +				repo_for_each_abbrev(the_repository, arg, the_hash_algo,
>  						     show_abbrev, NULL);
>  				continue;
>  			}
> diff --git a/hash-ll.h b/hash-ll.h
> index 10d84cc20888..2cfde63ae1cf 100644
> --- a/hash-ll.h
> +++ b/hash-ll.h
> @@ -145,6 +145,7 @@ struct object_id {
>  #define GET_OID_RECORD_PATH     0200
>  #define GET_OID_ONLY_TO_DIE    04000
>  #define GET_OID_REQUIRE_PATH  010000
> +#define GET_OID_HASH_ANY      020000
>  
>  #define GET_OID_DISAMBIGUATORS \
>  	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
> diff --git a/object-name.c b/object-name.c
> index 0bfa29dbbfe9..7dd6e5e47566 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -25,6 +25,7 @@
>  #include "midx.h"
>  #include "commit-reach.h"
>  #include "date.h"
> +#include "object-file-convert.h"
>  
>  static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *);
>  
> @@ -49,6 +50,7 @@ struct disambiguate_state {
>  
>  static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
>  {
> +	/* The hash algorithm of current has already been filtered */
>  	if (ds->always_call_fn) {
>  		ds->ambiguous = ds->fn(ds->repo, current, ds->cb_data) ? 1 : 0;
>  		return;
> @@ -134,6 +136,8 @@ static void unique_in_midx(struct multi_pack_index *m,
>  {
>  	uint32_t num, i, first = 0;
>  	const struct object_id *current = NULL;
> +	int len = ds->len > ds->repo->hash_algo->hexsz ?
> +		ds->repo->hash_algo->hexsz : ds->len;

`hexsz` is not an `int`, but a `size_t`. `match_hash()` of course uses
a third type `unsigned` instead, adding to the confusion.

>  	num = m->num_objects;
>  
>  	if (!num)
> @@ -149,7 +153,7 @@ static void unique_in_midx(struct multi_pack_index *m,
>  	for (i = first; i < num && !ds->ambiguous; i++) {
>  		struct object_id oid;
>  		current = nth_midxed_object_oid(&oid, m, i);
> -		if (!match_hash(ds->len, ds->bin_pfx.hash, current->hash))
> +		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
>  			break;
>  		update_candidates(ds, current);
>  	}
> @@ -159,6 +163,8 @@ static void unique_in_pack(struct packed_git *p,
>  			   struct disambiguate_state *ds)
>  {
>  	uint32_t num, i, first = 0;
> +	int len = ds->len > ds->repo->hash_algo->hexsz ?
> +		ds->repo->hash_algo->hexsz : ds->len;
>  
>  	if (p->multi_pack_index)
>  		return;
> @@ -177,7 +183,7 @@ static void unique_in_pack(struct packed_git *p,
>  	for (i = first; i < num && !ds->ambiguous; i++) {
>  		struct object_id oid;
>  		nth_packed_object_id(&oid, p, i);
> -		if (!match_hash(ds->len, ds->bin_pfx.hash, oid.hash))
> +		if (!match_hash(len, ds->bin_pfx.hash, oid.hash))
>  			break;
>  		update_candidates(ds, &oid);
>  	}
> @@ -188,6 +194,10 @@ static void find_short_packed_object(struct disambiguate_state *ds)
>  	struct multi_pack_index *m;
>  	struct packed_git *p;
>  
> +	/* Skip, unless oids from the storage hash algorithm are wanted */
> +	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> +		return;
> +
>  	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
>  	     m = m->next)
>  		unique_in_midx(m, ds);
> @@ -326,11 +336,12 @@ int set_disambiguate_hint_config(const char *var, const char *value)
>  
>  static int init_object_disambiguation(struct repository *r,
>  				      const char *name, int len,
> +				      const struct git_hash_algo *algo,
>  				      struct disambiguate_state *ds)
>  {
>  	int i;
>  
> -	if (len < MINIMUM_ABBREV || len > the_hash_algo->hexsz)
> +	if (len < MINIMUM_ABBREV || len > GIT_MAX_HEXSZ)
>  		return -1;

Isn't this loosening things up a bit too much? I'd have expected that we
would compare with `algo->hexsz`, unless `GET_OID_HASH_ANY` is set and
thus `algo == NULL`.

Patrick

>  	memset(ds, 0, sizeof(*ds));
> @@ -357,6 +368,7 @@ static int init_object_disambiguation(struct repository *r,
>  	ds->len = len;
>  	ds->hex_pfx[len] = '\0';
>  	ds->repo = r;
> +	ds->bin_pfx.algo = algo ? hash_algo_by_ptr(algo) : GIT_HASH_UNKNOWN;
>  	prepare_alt_odb(r);
>  	return 0;
>  }
> @@ -491,9 +503,10 @@ static int repo_collect_ambiguous(struct repository *r UNUSED,
>  	return collect_ambiguous(oid, data);
>  }
>  
> -static int sort_ambiguous(const void *a, const void *b, void *ctx)
> +static int sort_ambiguous(const void *va, const void *vb, void *ctx)
>  {
>  	struct repository *sort_ambiguous_repo = ctx;
> +	const struct object_id *a = va, *b = vb;
>  	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
>  	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
>  	int a_type_sort;
> @@ -503,8 +516,12 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
>  	 * Sorts by hash within the same object type, just as
>  	 * oid_array_for_each_unique() would do.
>  	 */
> -	if (a_type == b_type)
> -		return oidcmp(a, b);
> +	if (a_type == b_type) {
> +		if (a->algo == b->algo)
> +			return oidcmp(a, b);
> +		else
> +			return a->algo > b->algo ? 1 : -1;
> +	}
>  
>  	/*
>  	 * Between object types show tags, then commits, and finally
> @@ -533,8 +550,12 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  	int status;
>  	struct disambiguate_state ds;
>  	int quietly = !!(flags & GET_OID_QUIETLY);
> +	const struct git_hash_algo *algo = r->hash_algo;
> +
> +	if (flags & GET_OID_HASH_ANY)
> +		algo = NULL;
>  
> -	if (init_object_disambiguation(r, name, len, &ds) < 0)
> +	if (init_object_disambiguation(r, name, len, algo, &ds) < 0)
>  		return -1;
>  
>  	if (HAS_MULTI_BITS(flags & GET_OID_DISAMBIGUATORS))
> @@ -588,7 +609,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  		if (!ds.ambiguous)
>  			ds.fn = NULL;
>  
> -		repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect);
> +		repo_for_each_abbrev(r, ds.hex_pfx, algo, collect_ambiguous, &collect);
>  		sort_ambiguous_oid_array(r, &collect);
>  
>  		if (oid_array_for_each(&collect, show_ambiguous_object, &out))
> @@ -610,13 +631,14 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  }
>  
>  int repo_for_each_abbrev(struct repository *r, const char *prefix,
> +			 const struct git_hash_algo *algo,
>  			 each_abbrev_fn fn, void *cb_data)
>  {
>  	struct oid_array collect = OID_ARRAY_INIT;
>  	struct disambiguate_state ds;
>  	int ret;
>  
> -	if (init_object_disambiguation(r, prefix, strlen(prefix), &ds) < 0)
> +	if (init_object_disambiguation(r, prefix, strlen(prefix), algo, &ds) < 0)
>  		return -1;
>  
>  	ds.always_call_fn = 1;
> @@ -787,10 +809,12 @@ void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
>  int repo_find_unique_abbrev_r(struct repository *r, char *hex,
>  			      const struct object_id *oid, int len)
>  {
> +	const struct git_hash_algo *algo =
> +		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
>  	struct disambiguate_state ds;
>  	struct min_abbrev_data mad;
>  	struct object_id oid_ret;
> -	const unsigned hexsz = r->hash_algo->hexsz;
> +	const unsigned hexsz = algo->hexsz;
>  
>  	if (len < 0) {
>  		unsigned long count = repo_approximate_object_count(r);
> @@ -826,7 +850,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
>  
>  	find_abbrev_len_packed(&mad);
>  
> -	if (init_object_disambiguation(r, hex, mad.cur_len, &ds) < 0)
> +	if (init_object_disambiguation(r, hex, mad.cur_len, algo, &ds) < 0)
>  		return -1;
>  
>  	ds.fn = repo_extend_abbrev_len;
> diff --git a/object-name.h b/object-name.h
> index 9ae522307148..064ddc97d1fe 100644
> --- a/object-name.h
> +++ b/object-name.h
> @@ -67,7 +67,8 @@ enum get_oid_result get_oid_with_context(struct repository *repo, const char *st
>  
>  
>  typedef int each_abbrev_fn(const struct object_id *oid, void *);
> -int repo_for_each_abbrev(struct repository *r, const char *prefix, each_abbrev_fn, void *);
> +int repo_for_each_abbrev(struct repository *r, const char *prefix,
> +			 const struct git_hash_algo *algo, each_abbrev_fn, void *);
>  
>  int set_disambiguate_hint_config(const char *var, const char *value);
>  
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
From: Patrick Steinhardt @ 2024-02-15 11:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <20231002024034.2611-2-ebiederm@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2282 bytes --]

On Sun, Oct 01, 2023 at 09:40:06PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> While looking at how to handle input of both SHA-1 and SHA-256 oids in
> get_oid_with_context, I realized that the oid_array in
> repo_for_each_abbrev might have more than one kind of oid stored in it
> simultaneously.
> 
> Update to oid_array_append to ensure that oids added to an oid array
> always have an algorithm set.
> 
> Update void_hashcmp to first verify two oids use the same hash algorithm
> before comparing them to each other.
> 
> With that oid-array should be safe to use with different kinds of
> oids simultaneously.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  oid-array.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/oid-array.c b/oid-array.c
> index 8e4717746c31..1f36651754ed 100644
> --- a/oid-array.c
> +++ b/oid-array.c
> @@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
>  {
>  	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
>  	oidcpy(&array->oid[array->nr++], oid);
> +	if (!oid->algo)
> +		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);

I feel like it's a design wart that `oid_array_append()` now started to
depend on repository discovery, adding an external dependency to it that
may cause very confusing behaviour. Are there for example ever cases
where we populate such an OID array before we have discovered the repo?
Can it happen that we use OID arrays in the context of a submodule that
has a different object ID than the main repository?

>  	array->sorted = 0;
>  }
>  
> -static int void_hashcmp(const void *a, const void *b)
> +static int void_hashcmp(const void *va, const void *vb)
>  {
> -	return oidcmp(a, b);
> +	const struct object_id *a = va, *b = vb;
> +	int ret;
> +	if (a->algo == b->algo)
> +		ret = oidcmp(a, b);
> +	else
> +		ret = a->algo > b->algo ? 1 : -1;

Okay, so we basically end up sorting first by the algorithm, and then we
sort all object IDs of a specific algorithm relative to each other.

Patrick

> +	return ret;
>  }
>  
>  void oid_array_sort(struct oid_array *array)
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another
From: Patrick Steinhardt @ 2024-02-15 11:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman
In-Reply-To: <20231002024034.2611-1-ebiederm@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5851 bytes --]

On Sun, Oct 01, 2023 at 09:40:05PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Two basic functions are provided:
> - convert_object_file Takes an object file it's type and hash algorithm
>   and converts it into the equivalent object file that would
>   have been generated with hash algorithm "to".
> 
>   For blob objects there is no conversation to be done and it is an
>   error to use this function on them.
> 
>   For commit, tree, and tag objects embedded oids are replaced by the
>   oids of the objects they refer to with those objects and their
>   object ids reencoded in with the hash algorithm "to".  Signatures
>   are rearranged so that they remain valid after the object has
>   been reencoded.
> 
> - repo_oid_to_algop which takes an oid that refers to an object file
>   and returns the oid of the equivalent object file generated
>   with the target hash algorithm.
> 
> The pair of files object-file-convert.c and object-file-convert.h are
> introduced to hold as much of this logic as possible to keep this
> conversion logic cleanly separated from everything else and in the
> hopes that someday the code will be clean enough git can support
> compiling out support for sha1 and the various conversion functions.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  Makefile              |  1 +
>  object-file-convert.c | 57 +++++++++++++++++++++++++++++++++++++++++++
>  object-file-convert.h | 24 ++++++++++++++++++
>  3 files changed, 82 insertions(+)
>  create mode 100644 object-file-convert.c
>  create mode 100644 object-file-convert.h
> 
> diff --git a/Makefile b/Makefile
> index 577630936535..f7e824f25cda 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o
>  LIB_OBJS += notes-merge.o
>  LIB_OBJS += notes-utils.o
>  LIB_OBJS += notes.o
> +LIB_OBJS += object-file-convert.o
>  LIB_OBJS += object-file.o
>  LIB_OBJS += object-name.o
>  LIB_OBJS += object.o
> diff --git a/object-file-convert.c b/object-file-convert.c
> new file mode 100644
> index 000000000000..4777aba83636
> --- /dev/null
> +++ b/object-file-convert.c
> @@ -0,0 +1,57 @@
> +#include "git-compat-util.h"
> +#include "gettext.h"
> +#include "strbuf.h"
> +#include "repository.h"
> +#include "hash-ll.h"
> +#include "object.h"
> +#include "object-file-convert.h"
> +
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +		      const struct git_hash_algo *to, struct object_id *dest)
> +{
> +	/*
> +	 * If the source algorithm is not set, then we're using the
> +	 * default hash algorithm for that object.
> +	 */
> +	const struct git_hash_algo *from =
> +		src->algo ? &hash_algos[src->algo] : repo->hash_algo;
> +
> +	if (from == to) {
> +		if (src != dest)
> +			oidcpy(dest, src);
> +		return 0;
> +	}
> +	return -1;
> +}

In it's current form, `repo_oid_to_algop()` basically never does
anything except for copying over the object ID because we do not handle
the case where object hashes are different. I assume this is intended,
as we basically only provide stubs in this commit. But still, it would
help to document this in-code as well with a comment.

> +int convert_object_file(struct strbuf *outbuf,
> +			const struct git_hash_algo *from,
> +			const struct git_hash_algo *to,
> +			const void *buf, size_t len,
> +			enum object_type type,
> +			int gentle)
> +{
> +	int ret;
> +
> +	/* Don't call this function when no conversion is necessary */
> +	if ((from == to) || (type == OBJ_BLOB))
> +		BUG("Refusing noop object file conversion");

The extra braces around comparisons are unneeded and to the best of my
knowledge not customary in our code base. Also, error messages should
start with a lower-case letter.

> +	switch (type) {
> +	case OBJ_COMMIT:
> +	case OBJ_TREE:
> +	case OBJ_TAG:
> +	default:
> +		/* Not implemented yet, so fail. */
> +		ret = -1;
> +		break;
> +	}

It's a bit weird that we handle all object types except for blobs
separately, and then still have a `default` statement. I would've
thought that we should handle the object types specifically and set `ret
= -1` for all of them, and then the `default` case would instead call
`BUG()` due to an unknown object type.

> +	if (!ret)
> +		return 0;
> +	if (gentle) {
> +		strbuf_release(outbuf);
> +		return ret;
> +	}

Do you really intend to call `strbuf_release()` on the caller provided
buffer, or should this rather be `strbuf_reset()`? Memory management of
such an in/out parameter should typically be handled by the caller, not
the callee.

> +	die(_("Failed to convert object from %s to %s"),
> +		from->name, to->name);
> +}

The error message should start with a lower-case letter.

> diff --git a/object-file-convert.h b/object-file-convert.h
> new file mode 100644
> index 000000000000..a4f802aa8eea
> --- /dev/null
> +++ b/object-file-convert.h
> @@ -0,0 +1,24 @@
> +#ifndef OBJECT_CONVERT_H
> +#define OBJECT_CONVERT_H
> +
> +struct repository;
> +struct object_id;
> +struct git_hash_algo;
> +struct strbuf;
> +#include "object.h"
> +
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +		      const struct git_hash_algo *to, struct object_id *dest);
> +
> +/*
> + * Convert an object file from one hash algorithm to another algorithm.
> + * Return -1 on failure, 0 on success.
> + */
> +int convert_object_file(struct strbuf *outbuf,
> +			const struct git_hash_algo *from,
> +			const struct git_hash_algo *to,
> +			const void *buf, size_t len,
> +			enum object_type type,
> +			int gentle);

It would be nice to document what `gentle` does.

Patrick

> +#endif /* OBJECT_CONVERT_H */
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Git difftool: interaction between --dir-diff and --trust-exit-code
From: Jean-Rémy Falleri @ 2024-02-15 11:09 UTC (permalink / raw)
  To: git

Hi all and thanks for the amazing work you do on Git.

It seems that the —trust-exit-code option from git-difftool is not working when one use —dir-diff.

As an example I have set up the following configuration :

[difftool "false"]
cmd = false

And when I launch git-difftool in normal mode with —trust-exit-code, it works fine:
$ git difftool -y -t false --trust-exit-code HEAD HEAD~1
$ echo $?
> 128

However the same command in dir-diff mode is not working :
$ git difftool -t false -d --trust-exit-code HEAD HEAD~1
$ echo $?
> 0

From what I read in git/git-difftool—helper.sh it seems to not forward the exit status when $GIT_DIFFTOOL_DIRDIFF is on.

I believe there is nothing in the documentation about this interaction. Maybe this is intended but I find that this could be useful to have this option working on dir-diff mode too. For instance in my use case I would want to signal an error when I detect breaking changes with the breaking change detector we are working on, that is hooked as a difftool.

Best regards!
---
Jean-Rémy Falleri (http://www.labri.fr/~falleri)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox