Git development

Git development
 help / color / mirror / Atom feed

* [PATCH v3 8/8] reftable/merged: transfer ownership of records when iterating
From: Patrick Steinhardt @ 2024-01-03  6:22 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

When iterating over records with the merged iterator we put the records
into a priority queue before yielding them to the caller. This means
that we need to allocate the contents of these records before we can
pass them over to the caller.

The handover to the caller is quite inefficient though because we first
deallocate the record passed in by the caller and then copy over the new
record, which requires us to reallocate memory.

Refactor the code to instead transfer ownership of the new record to the
caller. So instead of reallocating all contents, we now release the old
record and then copy contents of the new record into place.

The following benchmark of `git show-ref --quiet` in a repository with
around 350k refs shows a clear improvement. Before:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 21,163 bytes in 193 blocks
      total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated

This shows that we now have roundabout a single allocation per record
that we're yielding from the iterator. Ideally, we'd also get rid of
this allocation so that the number of allocations doesn't scale with the
number of refs anymore. This would require some larger surgery though
because the memory is owned by the priority queue before transferring it
over to the caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index a28bb99aaf..a52914d667 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -124,10 +124,12 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 		reftable_record_release(&top.rec);
 	}

-	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_release(rec);
+	*rec = entry.rec;

 done:
-	reftable_record_release(&entry.rec);
+	if (err)
+		reftable_record_release(&entry.rec);
 	return err;
 }

-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: Concurrent fetch commands
From: Patrick Steinhardt @ 2024-01-03  7:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Haller, git
In-Reply-To: <xmqqy1daffk8.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 3123 bytes --]

On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> Stefan Haller <lists@haller-berlin.de> writes:
> 
> > I can reliably reproduce this by doing
> >
> >    $ git fetch&; sleep 0.1; git pull
> >    [1] 42160
> >    [1]  + done       git fetch
> >    fatal: Cannot rebase onto multiple branches.
> 
> I see a bug here.
> 
> How this _ought_ to work is
> 
>  - The first "git fetch" wants to report what it fetched by writing
>    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
>    the fetch finishes can consume its contents).
> 
>  - The second "git pull" runs "git fetch" under the hood.  Because
>    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
>    is already somebody writing to the file, it should notice and
>    barf, saying "fatal: a 'git fetch' is already working" or
>    something.
> 
> But because there is no "Do not overwrite FETCH_HEAD somebody else
> is using" protection, "git merge" or "git rebase" that is run as the
> second half of the "git pull" ends up working on the contents of
> FETCH_HEAD that is undefined, and GIGO result follows.
> 
> The "bug" that the second "git fetch" does not notice an already
> running one (who is in possession of FETCH_HEAD) and refrain from
> starting is not easy to design a fix for---we cannot just abort by
> opening it with O_CREAT|O_EXCL because it is a normal thing for
> $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> its contents before starting to avoid getting affected by contents
> leftover by the last fetch, but when there is a "git fetch" that is
> actively running, and it finishes _after_ the second one starts and
> truncates the file, the second one will end up seeing the contents
> the first one left.  We have the "--no-write-fetch-head" option for
> users to explicitly tell which invocation of "git fetch" should not
> write FETCH_HEAD.

While I agree that it's the right thing to use "--no-write-fetch-head"
in this context, I still wonder whether we want to fix this "bug". It
would be a rather easy change on our side to start using the lockfile
API to write to FETCH_HEAD, which has a bunch of benefits:

  - We would block concurrent processes of writing to FETCH_HEAD at the
    same time (well, at least for clients aware of the new semantics).

  - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
    multiple processes write to it at the same time.

  - We're also more robust against corruption in the context of hard
    crashes due to atomic rename semantics and proper flushing.

I don't really see much of a downside except for the fact that we change
how this special ref is being written to, so other implementations would
need to adapt accordingly. But even if they didn't, if clients with both
the new and old behaviour write FETCH_HEAD at the same point in time the
result would still be a consistent FETCH_HEAD if both writes finish. We
do have a race now which of both versions of FETCH_HEAD we see, but that
feels better than corrupted contents to me.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2] write-or-die: make GIT_FLUSH a Boolean environment variable
From: Chandra Pratap via GitGitGadget @ 2024-01-03  7:58 UTC (permalink / raw)
  To: git; +Cc: Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1628.git.1703955246308.gitgitgadget@gmail.com>

From: Chandra Pratap <chandrapratap3519@gmail.com>

Among Git's environment variable, the ones marked as "Boolean"
accept values in a way similar to Boolean configuration variables,
i.e. values like 'yes', 'on', 'true' and positive numbers are
taken as "on" and values like 'no', 'off', 'false' are taken as
"off".
GIT_FLUSH can be used to force Git to use non-buffered I/O when
writing to stdout. It can only accept two values, '1' which causes
Git to flush more often and '0' which makes all output buffered.
Make GIT_FLUSH accept more values besides '0' and '1' by turning it
into a Boolean environment variable, modifying the required logic.
Update the related documentation.

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
---
    write-or-die: make GIT_FLUSH a Boolean environment variable
    
    Helped-by: Torsten Bögershausen <tboegi@web.de>

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1628%2FChand-ra%2Fgit_flush-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1628/Chand-ra/git_flush-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1628

Range-diff vs v1:

 1:  2123b43a080 ! 1:  585c76fff68 write-or-die: make GIT_FLUSH a Boolean environment variable
     @@ write-or-die.c: void maybe_flush_or_die(FILE *f, const char *desc)
       	static int skip_stdout_flush = -1;
       	struct stat st;
      -	char *cp;
     -+	int cp;
       
       	if (f == stdout) {
       		if (skip_stdout_flush < 0) {
     @@ write-or-die.c: void maybe_flush_or_die(FILE *f, const char *desc)
      -			cp = getenv("GIT_FLUSH");
      -			if (cp)
      -				skip_stdout_flush = (atoi(cp) == 0);
     -+			cp = git_env_bool("GIT_FLUSH", -1);
     -+			if (cp >= 0)
     -+				skip_stdout_flush = (cp == 0);
     - 			else if ((fstat(fileno(stdout), &st) == 0) &&
     +-			else if ((fstat(fileno(stdout), &st) == 0) &&
     ++			if (!git_env_bool("GIT_FLUSH", -1))
     ++				skip_stdout_flush = 1;
     ++			else if (!fstat(fileno(stdout), &st) &&
       				 S_ISREG(st.st_mode))
       				skip_stdout_flush = 1;
     + 			else


 Documentation/git.txt | 16 +++++++---------
 write-or-die.c        |  9 +++------
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 2535a30194f..83fd60f2d11 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -724,16 +724,14 @@ for further details.
 	waiting for someone with sufficient permissions to fix it.
 
 `GIT_FLUSH`::
-// NEEDSWORK: make it into a usual Boolean environment variable
-	If this environment variable is set to "1", then commands such
+	If this Boolean environment variable is set to true, then commands such
 	as 'git blame' (in incremental mode), 'git rev-list', 'git log',
-	'git check-attr' and 'git check-ignore' will
-	force a flush of the output stream after each record have been
-	flushed. If this
-	variable is set to "0", the output of these commands will be done
-	using completely buffered I/O.   If this environment variable is
-	not set, Git will choose buffered or record-oriented flushing
-	based on whether stdout appears to be redirected to a file or not.
+	'git check-attr' and 'git check-ignore' will force a flush of the output
+	stream after each record have been flushed. If this variable is set to
+	false, the output of these commands will be done using completely buffered
+	I/O. If this environment variable is not set, Git will choose buffered or
+	record-oriented flushing based on whether stdout appears to be redirected
+	to a file or not.
 
 `GIT_TRACE`::
 	Enables general trace messages, e.g. alias expansion, built-in
diff --git a/write-or-die.c b/write-or-die.c
index 42a2dc73cd3..a6acabd329f 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -20,15 +20,12 @@ void maybe_flush_or_die(FILE *f, const char *desc)
 {
 	static int skip_stdout_flush = -1;
 	struct stat st;
-	char *cp;
 
 	if (f == stdout) {
 		if (skip_stdout_flush < 0) {
-			/* NEEDSWORK: make this a normal Boolean */
-			cp = getenv("GIT_FLUSH");
-			if (cp)
-				skip_stdout_flush = (atoi(cp) == 0);
-			else if ((fstat(fileno(stdout), &st) == 0) &&
+			if (!git_env_bool("GIT_FLUSH", -1))
+				skip_stdout_flush = 1;
+			else if (!fstat(fileno(stdout), &st) &&
 				 S_ISREG(st.st_mode))
 				skip_stdout_flush = 1;
 			else

base-commit: 1a87c842ece327d03d08096395969aca5e0a6996
-- 
gitgitgadget

^ permalink raw reply related

* Re: Concurrent fetch commands
From: Patrick Steinhardt @ 2024-01-03  8:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Haller, git
In-Reply-To: <ZZUNxNciNb_xZveY@tanuki>

[-- Attachment #1: Type: text/plain, Size: 3619 bytes --]

On Wed, Jan 03, 2024 at 08:33:24AM +0100, Patrick Steinhardt wrote:
> On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> > Stefan Haller <lists@haller-berlin.de> writes:
> > 
> > > I can reliably reproduce this by doing
> > >
> > >    $ git fetch&; sleep 0.1; git pull
> > >    [1] 42160
> > >    [1]  + done       git fetch
> > >    fatal: Cannot rebase onto multiple branches.
> > 
> > I see a bug here.
> > 
> > How this _ought_ to work is
> > 
> >  - The first "git fetch" wants to report what it fetched by writing
> >    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
> >    the fetch finishes can consume its contents).
> > 
> >  - The second "git pull" runs "git fetch" under the hood.  Because
> >    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
> >    is already somebody writing to the file, it should notice and
> >    barf, saying "fatal: a 'git fetch' is already working" or
> >    something.
> > 
> > But because there is no "Do not overwrite FETCH_HEAD somebody else
> > is using" protection, "git merge" or "git rebase" that is run as the
> > second half of the "git pull" ends up working on the contents of
> > FETCH_HEAD that is undefined, and GIGO result follows.
> > 
> > The "bug" that the second "git fetch" does not notice an already
> > running one (who is in possession of FETCH_HEAD) and refrain from
> > starting is not easy to design a fix for---we cannot just abort by
> > opening it with O_CREAT|O_EXCL because it is a normal thing for
> > $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> > its contents before starting to avoid getting affected by contents
> > leftover by the last fetch, but when there is a "git fetch" that is
> > actively running, and it finishes _after_ the second one starts and
> > truncates the file, the second one will end up seeing the contents
> > the first one left.  We have the "--no-write-fetch-head" option for
> > users to explicitly tell which invocation of "git fetch" should not
> > write FETCH_HEAD.
> 
> While I agree that it's the right thing to use "--no-write-fetch-head"
> in this context, I still wonder whether we want to fix this "bug". It
> would be a rather easy change on our side to start using the lockfile
> API to write to FETCH_HEAD, which has a bunch of benefits:
> 
>   - We would block concurrent processes of writing to FETCH_HEAD at the
>     same time (well, at least for clients aware of the new semantics).
> 
>   - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
>     multiple processes write to it at the same time.
> 
>   - We're also more robust against corruption in the context of hard
>     crashes due to atomic rename semantics and proper flushing.
> 
> I don't really see much of a downside except for the fact that we change
> how this special ref is being written to, so other implementations would
> need to adapt accordingly. But even if they didn't, if clients with both
> the new and old behaviour write FETCH_HEAD at the same point in time the
> result would still be a consistent FETCH_HEAD if both writes finish. We
> do have a race now which of both versions of FETCH_HEAD we see, but that
> feels better than corrupted contents to me.

Ah, one thing I didn't think of is parallel fetches. It's expected that
all of the fetches write into FETCH_HEAD at the same point in time
concurrently, so the end result is the collection of updated refs even
though the order isn't guaranteed. That makes things a bit more
complicated indeed.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH V4 1/1] Replace SID with domain/username
From: Matthias Aßhauer @ 2024-01-03  8:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sören Krecker, Johannes Schindelin, git, sunshine
In-Reply-To: <xmqqa5pnckm4.fsf@gitster.g>



On Tue, 2 Jan 2024, Junio C Hamano wrote:

> Sören Krecker <soekkle@freenet.de> writes:
>
>> Replace SID with domain/username in error message, if owner of repository
>> and user are not equal on windows systems. Each user should have a unique
>> SID (https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/understand-security-identifiers#what-are-security-identifiers).
>
> That paragraph your URL refers to does say that a SID that is used
> for an account will never be reused to identify a different account.
> But I am not sure if it means a user will never be assigned more
> than one SID (in other words, the reverse is not necessarily true).

To my knowledge a user account will never have multiple active SIDs, but
the documentation of LookupAccountSidA [1] explicitly mentions that it 
does look up historic SIDs.

> In addition to looking up SIDs for local accounts, local domain 
> accounts, and explicitly trusted domain accounts, LookupAccountSid can 
> look up SIDs for any account in any domain in the forest, including SIDs 
> that appear only in the SIDhistory field of an account in the forest. 
> The SIDhistory field stores former SIDs of an account that has been
> moved from another domain.

[1] https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-lookupaccountsida#remarks

>
> The paragraph also mentions that a SID can identify a non-user
> entity like a computer account (as opposed to "a user account")---I
> do not know what its implications are in the context of this patch,
> though.
>
>> This means that domain/username is not a loss of information.
>
> This statement does not (grammatically) make sense, but more
> importantly, loss of information may not be a bad thing in this
> case.  If more than one SIDs are given to a user account and
> processes working for that account, these different SIDs may be
> translated, by using LookupAccountSidA(), to the same string for a
> single user@domain, and it would be an operation that loses
> information in that sense.
>
> But if what we *care* about is user@domain between the current
> process and the owner of the directory in question being the same
> (or not), then such a loss of information is a *good* thing.

This patch only changes the output of our error message, though.
It does not change what ownership information we actually compare.
So if we had a hypothetical user Bob that was part of the  domain 
example.com (SID S-1-5-21-100000001-1000000001-10000001-1001) and
had been moved over from the example.org domain (old SID S-1-5-21-
2000000002-2000000002-20000002-2002) and we would detect a repository 
owned by bobs old SID, we would now lookup the old SID, find it 
attached to a user named example.com\Bob, look up Bobs  current SID, find 
it belongs to a user named example.com\Bob and print a confusing error 
message.

> So I dunno.  Arguing what we care about (is that exact SID equality
> between the "owner of the directory" and the "user, which the
> current process is working on behalf of", or do we care about the
> equality of the "accounts"?) may be a better way to justify this
> change, if you ask me.
>
>> +static BOOL user_sid_to_user_name(PSID sid, LPSTR *str)
>> +{
>> +	SID_NAME_USE pe_use;
>> +	DWORD len_user = 0, len_domain = 0;
>> +	BOOL translate_sid_to_user;
>> +
>> +	/* returns only FALSE, because the string pointers are NULL*/
>> +	LookupAccountSidA(NULL, sid, NULL, &len_user, NULL, &len_domain,
>> +			  &pe_use);
>> +	/*Alloc needed space of the strings*/
>> +	ALLOC_ARRAY((*str), (size_t)len_domain + (size_t)len_user);
>> +	translate_sid_to_user = LookupAccountSidA(NULL, sid, (*str) + len_domain, &len_user,
>> +				   *str, &len_domain, &pe_use);
>> +	if (translate_sid_to_user == FALSE) {
>> +		FREE_AND_NULL(*str);
>> +	}
>
> Style: do not enclose a single-statement block inside {}.
>
>> +	else
>> +		(*str)[len_domain] = '/';
>> +	return translate_sid_to_user;
>> +}
>
>> @@ -2767,7 +2788,9 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
>>  		} else if (report) {
>>  			LPSTR str1, str2, to_free1 = NULL, to_free2 = NULL;
>>
>> -			if (ConvertSidToStringSidA(sid, &str1))
>> +			if (user_sid_to_user_name(sid, &str1))
>> +				to_free1 = str1;
>> +			else if (ConvertSidToStringSidA(sid, &str1))
>>  				to_free1 = str1;
>
> Do these two helper functions return pointers pointing into the same
> kind of memory that you can free with the same function?  That is ...
>
>> ...
>>  				    "'%s' is owned by:\n"
>>  				    "\t'%s'\nbut the current user is:\n"
>>  				    "\t'%s'\n", path, str1, str2);
>> -			LocalFree(to_free1);
>> -			LocalFree(to_free2);
>> +			free(to_free1);
>> +			free(to_free2);
>
> ... the original code seems to say that the piece of memory we
> obtain from ConvertSidToStringSidA() must not be freed by calling
> free() but use something special called LocalFree().  I am assuing
> that your user_sid_to_user_name() returns a regular piece of memory
> that can be freed by calling regular free()?  Do we need to keep
> track of where we got the memory from and use different function to
> free each variable, or something (again I do not do Windows so I'll
> defer all of these to Dscho, who is CC'ed this time).
>
> Thanks and a happy new year.
>
>>  		}
>>  	}
>
>

^ permalink raw reply

* Re: [PATCH v2] write-or-die: make GIT_FLUSH a Boolean environment variable
From: Patrick Steinhardt @ 2024-01-03  8:22 UTC (permalink / raw)
  To: Chandra Pratap via GitGitGadget; +Cc: git, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1628.v2.git.1704268708720.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

On Wed, Jan 03, 2024 at 07:58:28AM +0000, Chandra Pratap via GitGitGadget wrote:
[snip]
> diff --git a/write-or-die.c b/write-or-die.c
> index 42a2dc73cd3..a6acabd329f 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -20,15 +20,12 @@ void maybe_flush_or_die(FILE *f, const char *desc)
>  {
>  	static int skip_stdout_flush = -1;
>  	struct stat st;
> -	char *cp;
>  
>  	if (f == stdout) {
>  		if (skip_stdout_flush < 0) {
> -			/* NEEDSWORK: make this a normal Boolean */
> -			cp = getenv("GIT_FLUSH");
> -			if (cp)
> -				skip_stdout_flush = (atoi(cp) == 0);
> -			else if ((fstat(fileno(stdout), &st) == 0) &&
> +			if (!git_env_bool("GIT_FLUSH", -1))
> +				skip_stdout_flush = 1;

It's a bit surprising to pass `-1` as default value to `git_env_bool()`
here, as this value would hint that the caller wants to explicitly
handle the case where the "GIT_FLUSH" envvar is not set at all. We don't
though, and essentially fall back to "GIT_FLUSH=1", so passing `1` as
the fallback value would be less confusing.

Anyway, the resulting behaviour is the same regardless of whether we
pass `1` or `-1`, so I'm not sure whether this is worth a reroll.

Patrick

> +			else if (!fstat(fileno(stdout), &st) &&
>  				 S_ISREG(st.st_mode))
>  				skip_stdout_flush = 1;
>  			else
> 
> base-commit: 1a87c842ece327d03d08096395969aca5e0a6996
> -- 
> gitgitgadget
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: What's the recommendation for forgetting all rerere's records?
From: Sean Allred @ 2024-01-03  8:30 UTC (permalink / raw)
  To: Sean Allred; +Cc: Junio C Hamano, Sean Allred, git
In-Reply-To: <35f24a01d15ce28932bb6be098d6a164a49cc542008f75673cd6221a9b24b578@mu.id>


(Doesn't look like this actually got picked up by lore when I originally
sent it at Fri, 22 Dec 2023 12:19:50 -0600. This is a re-send; apologies
if you get this message twice.)

==

There might be a bug here.

Junio C Hamano <gitster@pobox.com> writes:
     # Now the fun command you seem to have missed.  You MUST give
     # "git checkout --merge" a pathspec.  I do not encourage it but
     # using "." to say "unresolve everything under the sun" should
     # also work.
     $ git checkout --merge builtin/mv.c
     Recreated 1 merge conflict

Yep, I definitely missed this! Very handy, thank you :-)

     # You should then be able to correct the resolution with your
     # editor.
     $ edit builtin/mv.c

     # If this is one-time fix (you are happy with the original
     # resolution and wanted to deviate from it only once this time),
     # there is nothing else need to be done.  If you want to record
     # this as a new resolution, you'd get rid of the old one and
     # record this one.
     $ git rerere forget builtin/mv.c
     $ git rerere

It's taken some time to investigate on our end, but it appears that the
issue we're seeing is particular to git-rebase.

Consider a run using git-merge, which works perfectly as you describe:

    # Let's set up our conflict; most output here elided for brevity.
    $ git init
    $ echo aaa >file
    $ git add file
    $ git commit -ambase
    $ git branch feature
    $ echo bbb >file
    $ git commit -amremote
    $ git switch feature
    $ echo ccc >file
    $ git commit -amlocal
    $ git --no-pager log --oneline --graph --all
    * 6b33f42 (HEAD -> feature) local
    | * 7e189f6 (main) remote
    |/
    * c5901f5 base

    # Now for the fun part!
    $ git config rerere.enabled true

    $ git merge main
    Auto-merging file
    CONFLICT (content): Merge conflict in file
    Recorded preimage for 'file'
    Automatic merge failed; fix conflicts and then commit the result.

    $ echo 'bad merge' >file
    $ git commit -ammerge
    Recorded resolution for 'file'.
    [feature 75d45f0] merge

    # Ack! That merge was bad. Let's try that again.
    $ git reset --hard @^
    HEAD is now at 6b33f42 local

    $ git merge main
    Auto-merging file
    CONFLICT (content): Merge conflict in file
    Resolved 'file' using previous resolution.
    Automatic merge failed; fix conflicts and then commit the result.

    # Your method to correct this single bad merge works flawlessly:
    $ git checkout --merge file
    Recreated 1 merge conflict

    $ git rerere forget file
    Updated preimage for 'file'
    Forgot resolution for 'file'

    $ echo 'good merge' >file

    $ git commit -ammerge
    Recorded resolution for 'file'.
    [feature 1770541] merge

Let's compare this with git-rebase:

    # Same setup as before
    $ git init
    $ echo aaa >file
    $ git add file
    $ git commit -ambase
    $ git branch feature
    $ echo bbb >file
    $ git commit -amremote
    $ git switch feature
    $ echo ccc >file
    $ git commit -amlocal
    $ git --no-pager log --oneline --graph --all
    * b4d7aeb (HEAD -> feature) local
    | * 2a0978d (main) remote
    |/
    * 91140a6 base

    # Now for the fun part! Just like before, but we're going to use
    # git-rebase instead of git-merge.
    $ git config rerere.enabled true

    $ git rebase main
    Auto-merging file
    CONFLICT (content): Merge conflict in file
    error: could not apply b4d7aeb... local
    hint: Resolve all conflicts manually, mark them as resolved with
    hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
    hint: You can instead skip this commit: run "git rebase --skip".
    hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
    Recorded preimage for 'file'
    Could not apply b4d7aeb... local

    $ echo 'bad merge' >file
    $ git add file

    $ EDITOR=: git rebase --continue
    file: needs merge
    You must edit all merge conflicts and then
    mark them as resolved using git add

    $ git rebase --abort

    $ git rebase main
    Auto-merging file
    CONFLICT (content): Merge conflict in file
    error: could not apply b4d7aeb... local
    hint: Resolve all conflicts manually, mark them as resolved with
    hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
    hint: You can instead skip this commit: run "git rebase --skip".
    hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
    Recorded preimage for 'file'
    Could not apply b4d7aeb... local

    $ git checkout --merge .
    Recreated 1 merge conflict

    $ git rerere forget .
    error: no remembered resolution for 'file'

    $ echo 'good merge' >file

    $ EDITOR=: git rebase --continue
    file: needs merge
    You must edit all merge conflicts and then
    mark them as resolved using git add

Is this a bug?

--
Sean Allred

^ permalink raw reply

* Re: [PATCH 2/6] setup: move creation of "refs/" into the files backend
From: Patrick Steinhardt @ 2024-01-03  8:33 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZTR6aW5aoxcMOS3U3TL1VxSfmyVno9fu7B5201pJTqyyg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

On Tue, Jan 02, 2024 at 05:23:18AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Move the code to create the directory into the files backend itself to
> > make it so. This means that future ref backends will also need to have
> > equivalent logic around to ensure that the directory exists, but it
> > seems a lot more sensible to have it this way round than to require
> > callers to create the directory themselves.
> >
> 
> Why not move it to refs.c:refs_init_db() instead? this way each
> implementation doesn't have to do it?

True, that would be another possibility. But I think it is conceptually
unclean to split up creation of the refdb into multiple locations. The
"files" backend already has to create "refs/heads/" and "refs/tags/",
and the "reftable" backend will set up "refs/heads" as a file so that
it's impossible to create branches as loose files by accident. So both
do have specific knowledge around how exactly this directory hierarchy
should look like, and thus I think it is sensible to make the code self
contained inside of the backends.

My opinion on this would be different if we expected a proliferation of
backends to happen, but that's quite unlikely. Furthermore, we may at
some point decide that repos don't need "refs/" and/or "HEAD" at all
anymore, at which point it is easier to drop the creation of those files
and dirs from the reftable backend.

I'll update the commit message to include these considerations.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Patrick Steinhardt @ 2024-01-03  8:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Karthik Nayak, Junio C Hamano, git, christian.couder
In-Reply-To: <ZZRaOhK869S1Sg1h@nand.local>

[-- Attachment #1: Type: text/plain, Size: 3338 bytes --]

On Tue, Jan 02, 2024 at 01:47:22PM -0500, Taylor Blau wrote:
> On Tue, Jan 02, 2024 at 07:18:48AM -0800, Karthik Nayak wrote:
> > > As "git for-each-ref" takes pattern that is prefix match, e.g.,
> > >
> > >     $ git for-each-ref refs/remotes/
> > >
> > > shows everything like refs/remotes/origin/main that begins with
> > > refs/remotes/, I wonder if
> > >
> > >     $ git for-each-ref ""
> > >
> > > should mean what you are asking for.  After all, "git for-each-ref"
> > > does *not* take "--branches" and others like "git log" family to
> > > limit its operation to subhierarchy of "refs/" to begin with.
> >
> > But I don't think using an empty pattern is the best way to go forward.
> > This would break the pattern matching feature. For instance, what if the
> > user wanted to print all refs, but pattern match "*_HEAD"?
> >
> > Would that be
> >
> >       $ git for-each-ref "" "*_HEAD"
> >
> > I think this would be confusing, since the first pattern is now acting
> > as an option, since its not really filtering rather its changing the
> > search space.
> >
> > Maybe "--all-refs" or "--all-ref-types" instead?
> 
> I tend to agree that the special empty pattern would be a good shorthand
> for listing all references underneath refs/, including any top-level
> psuedo-refs.
> 
> But I don't think that I quite follow what Karthik is saying here.
> for-each-ref returns the union of references that match the given
> pattern(s), not their intersection. So if you wanted to list just the
> psudo-refs ending in '_HEAD', you'd do:
> 
>   $ git for-each-ref "*_HEAD"
> 
> I think if you wanted to list all pseudo-refs, calling the option
> `--pseudo-refs` seems reasonable. But if you want to list some subset of
> psueod-refs matching a given pattern, you should specify that pattern
> directly.

Where I think this proposal falls short is if you have refs outside of
the "refs/" hierarchy. Granted, this is nothing that should usually
happen nowadays. But I think we should safeguard us for the future:

  - There may be bugs in the reftable backend that allow for such refs
    to be created.

  - We may even eventually end up saying that it's valid for refs to not
    start with "refs/". I consider this to be mostly an artifact of how
    the files backend works, so it is not entirely unreasonable for us
    to eventually lift the restriction for the reftable backend.

I do not want to push for the second bullet point anytime soon, nor do I
have any plans to do so in the future. But regardless of that I would
really love to have a way to ask the ref backend for _any_ reference
that it knows of, regardless of its prefix. Otherwise it becomes next to
impossible for a user to learn about what the reftable binary-format
actually contains. So I think that the current focus on pseudo-refs is
too short-sighted, and would want to aim for a more complete solution to
this problem.

This could be in the form of a `--all-refs` flag that gets translated
into a new `DO_FOR_EACH_REF_ALL_REFS` bit, which would indicate to the
ref backend to also enumerate refs outside of the "refs/" hierarchy.
This is orthogonal to the already existing `--all` pseudo-opt, because
`--all` would only ever enumerate refs inside of the "refs/" hierarchy.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
From: Jeff King @ 2024-01-03  9:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, git
In-Reply-To: <xmqq5y0bcjpw.fsf@gitster.g>

On Tue, Jan 02, 2024 at 05:02:35PM -0800, Junio C Hamano wrote:

> * jk/t1006-cat-file-objectsize-disk (2023-12-21) 1 commit
>   (merged to 'next' on 2023-12-28 at d82812e636)
>  + t1006: add tests for %(objectsize:disk)
> 
>  Test update.
> 
>  Will merge to 'master'.
>  source: <20231221094722.GA570888@coredump.intra.peff.net>

It looks like this is the original version. I posted a v2 that took
René's suggestion to swap out the awk for shell, but it got overlooked.
I'm happy enough either way, but if we want to salvage that effort,
here's a patch which could go on top:

-- >8 --
From: René Scharfe <l.s.r@web.de>
Subject: [PATCH] t1006: prefer shell loop to awk for packed object sizes

To compute the expected on-disk size of packed objects, we sort the
output of show-index by pack offset and then compute the difference
between adjacent entries using awk. This works but has a few readability
problems:

  1. Reading the index in pack order means don't find out the size of an
     oid's entry until we see the _next_ entry. So we have to save it to
     print later.

     We can instead iterate in reverse order, so we compute each oid's
     size as we see it.

  2. Since the awk invocation is inside a text_expect block, we can't
     easily use single-quotes to hold the script. So we use
     double-quotes, but then have to escape the dollar signs in the awk
     script.

     We can swap this out for a shell loop instead (which is made much
     easier by the first change).

Signed-off-by: Jeff King <peff@peff.net>
---
I gave René authorship since this was his cleverness, but obviously I
wrote the commit message. Giving an explicit signoff would be nice,
though.

 t/t1006-cat-file.sh | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 0c2eafae65..5ea3326128 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -1117,14 +1117,16 @@ test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
 		while read idx
 		do
 			git show-index <"$idx" >idx.raw &&
-			sort -n <idx.raw >idx.sorted &&
+			sort -nr <idx.raw >idx.sorted &&
 			packsz=$(test_file_size "${idx%.idx}.pack") &&
 			end=$((packsz - rawsz)) &&
-			awk -v end="$end" "
-			  NR > 1 { print oid, \$1 - start }
-			  { start = \$1; oid = \$2 }
-			  END { print oid, end - start }
-			" idx.sorted ||
+			while read start oid rest
+			do
+				size=$((end - start)) &&
+				end=$start &&
+				echo "$oid $size" ||
+				return 1
+			done <idx.sorted ||
 			return 1
 		done
 	} >expect.raw &&
-- 
2.43.0.514.g7147b80757


^ permalink raw reply related

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Karthik Nayak @ 2024-01-03 10:22 UTC (permalink / raw)
  To: Patrick Steinhardt, Taylor Blau; +Cc: Junio C Hamano, git, christian.couder
In-Reply-To: <ZZUgUUlB8A-rhep5@tanuki>

[-- Attachment #1: Type: text/plain, Size: 3991 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Jan 02, 2024 at 01:47:22PM -0500, Taylor Blau wrote:
>> On Tue, Jan 02, 2024 at 07:18:48AM -0800, Karthik Nayak wrote:
>> > > As "git for-each-ref" takes pattern that is prefix match, e.g.,
>> > >
>> > >     $ git for-each-ref refs/remotes/
>> > >
>> > > shows everything like refs/remotes/origin/main that begins with
>> > > refs/remotes/, I wonder if
>> > >
>> > >     $ git for-each-ref ""
>> > >
>> > > should mean what you are asking for.  After all, "git for-each-ref"
>> > > does *not* take "--branches" and others like "git log" family to
>> > > limit its operation to subhierarchy of "refs/" to begin with.
>> >
>> > But I don't think using an empty pattern is the best way to go forward.
>> > This would break the pattern matching feature. For instance, what if the
>> > user wanted to print all refs, but pattern match "*_HEAD"?
>> >
>> > Would that be
>> >
>> >       $ git for-each-ref "" "*_HEAD"
>> >
>> > I think this would be confusing, since the first pattern is now acting
>> > as an option, since its not really filtering rather its changing the
>> > search space.
>> >
>> > Maybe "--all-refs" or "--all-ref-types" instead?
>>
>> I tend to agree that the special empty pattern would be a good shorthand
>> for listing all references underneath refs/, including any top-level
>> psuedo-refs.
>>
>> But I don't think that I quite follow what Karthik is saying here.
>> for-each-ref returns the union of references that match the given
>> pattern(s), not their intersection. So if you wanted to list just the
>> psudo-refs ending in '_HEAD', you'd do:
>>
>>   $ git for-each-ref "*_HEAD"
>>
>> I think if you wanted to list all pseudo-refs, calling the option
>> `--pseudo-refs` seems reasonable. But if you want to list some subset of
>> psueod-refs matching a given pattern, you should specify that pattern
>> directly.
>
> Where I think this proposal falls short is if you have refs outside of
> the "refs/" hierarchy. Granted, this is nothing that should usually
> happen nowadays. But I think we should safeguard us for the future:
>
>   - There may be bugs in the reftable backend that allow for such refs
>     to be created.
>
>   - We may even eventually end up saying that it's valid for refs to not
>     start with "refs/". I consider this to be mostly an artifact of how
>     the files backend works, so it is not entirely unreasonable for us
>     to eventually lift the restriction for the reftable backend.
>
> I do not want to push for the second bullet point anytime soon, nor do I
> have any plans to do so in the future. But regardless of that I would
> really love to have a way to ask the ref backend for _any_ reference
> that it knows of, regardless of its prefix. Otherwise it becomes next to
> impossible for a user to learn about what the reftable binary-format
> actually contains. So I think that the current focus on pseudo-refs is
> too short-sighted, and would want to aim for a more complete solution to
> this problem.
>
> This could be in the form of a `--all-refs` flag that gets translated
> into a new `DO_FOR_EACH_REF_ALL_REFS` bit, which would indicate to the
> ref backend to also enumerate refs outside of the "refs/" hierarchy.
> This is orthogonal to the already existing `--all` pseudo-opt, because
> `--all` would only ever enumerate refs inside of the "refs/" hierarchy.
>
> Patrick

Thanks Patrick. I think the confusion was because I was referring to add
a new command to print all refs, meaning all refs including the ones
outside the "refs/" folder.

The confusion was that I thought Junio was referring to using

    $ git for-each-ref ""

to print all refs under $GIT_DIR, while he was actually talking about
"$GIT_DIR/refs/" directory.

So to loop back, I'm suggesting to add `--all-refs` to print all the
refs under $GIT_DIR. This would include refs traditionally found under
"refs/" and other refs like pseudo refs, HEAD and perhaps user created
refs under $GIT_DIR.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: What's the recommendation for forgetting all rerere's records?
From: Sean Allred @ 2024-01-03  9:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <m0frzeu89w.fsf@epic96565.epic.com>


There was enough going on with that prior email that I gave it another
look and found some errors.

>     $ echo 'bad merge' >file
>     $ git add file
>
>     $ EDITOR=: git rebase --continue
>     file: needs merge
>     You must edit all merge conflicts and then
>     mark them as resolved using git add
>
>     $ git rebase --abort
>
>     $ git rebase main
>     Auto-merging file
>     CONFLICT (content): Merge conflict in file
>     error: could not apply b4d7aeb... local
>     hint: Resolve all conflicts manually, mark them as resolved with
>     hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
>     hint: You can instead skip this commit: run "git rebase --skip".
>     hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
>     Recorded preimage for 'file'
>     Could not apply b4d7aeb... local
>
>     $ git checkout --merge .
>     Recreated 1 merge conflict
>
>     $ git rerere forget .
>     error: no remembered resolution for 'file'
>
>     $ echo 'good merge' >file
>
>     $ EDITOR=: git rebase --continue
>     file: needs merge
>     You must edit all merge conflicts and then
>     mark them as resolved using git add

This section should have read:

    $ echo 'bad merge' >file
    $ git add file

    $ EDITOR=: git rebase --continue
    Recorded resolution for 'file'.
    [detached HEAD 5e3c431] local
     1 file changed, 1 insertion(+), 1 deletion(-)
    Successfully rebased and updated refs/heads/feature.

    $ git reset --hard @{1}
    HEAD is now at b4d7aeb local

    $ git rebase main
    Auto-merging file
    CONFLICT (content): Merge conflict in file
    error: could not apply b4d7aeb... local
    hint: Resolve all conflicts manually, mark them as resolved with
    hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
    hint: You can instead skip this commit: run "git rebase --skip".
    hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
    Resolved 'file' using previous resolution.
    Could not apply b4d7aeb... local

    $ git checkout --merge .
    Recreated 1 merge conflict

    $ git rerere forget .
    error: no remembered resolution for 'file'

I've driven myself a little nuts trying to reproduce it this morning,
but in doing so I've come to an important discovery: this bug presents
if `core.autocrlf=true` but does *not* present if `core.autocrlf=input`.

For completeness and future reference, the following script reproduces
the issue on Windows:

    git init
    echo aaa >file
    git add file
    git commit -ambase
    git branch feature
    echo bbb >file
    git commit -amremote
    git switch feature
    echo ccc >file
    git commit -amlocal
    git config rerere.enabled true
    git config core.autocrlf true     # <--

    # setup complete; let's rebase!
    git rebase main
    echo 'bad merge' >file
    git add file
    EDITOR=: git rebase --continue

    # uh oh; that was a bad resolution; let's try again
    git reset --hard @{1}
    git rebase main
    git checkout --merge .
    git rerere forget .               # fails
    echo 'good merge' >file
    git add file
    EDITOR=: git rebase --continue

At the end of this script, the 'bad merge' is still the recorded
resolution and no rerere record exists for the 'good merge'.

Just in case there's another piece of config somehow relevant, here's a
dump of the system that reproduced this:

    $ git config --list --show-scope | sort
    global	user.email=[clip]
    global	user.name=[clip]
    local	core.autocrlf=true
    local	core.bare=false
    local	core.filemode=false
    local	core.ignorecase=true
    local	core.logallrefupdates=true
    local	core.repositoryformatversion=0
    local	core.symlinks=false
    local	rerere.enabled=true
    system	core.autocrlf=input
    system	core.fscache=true
    system	core.fsmonitor=true
    system	core.symlinks=false
    system	credential.helper=manager
    system	credential.https://dev.azure.com.usehttppath=true
    system	diff.astextplain.textconv=astextplain
    system	filter.lfs.clean=git-lfs clean -- %f
    system	filter.lfs.process=git-lfs filter-process
    system	filter.lfs.required=true
    system	filter.lfs.smudge=git-lfs smudge -- %f
    system	http.sslbackend=schannel
    system	http.sslcainfo=C:/Program Files/Git/mingw64/etc/ssl/certs/ca-bundle.crt
    system	init.defaultbranch=main
    system	pull.rebase=true

    $ git --version
    git version 2.43.0.windows.1

It's worth noting at this point that while I believe I reproduced on
macOS last week, that doesn't jive with the available evidence (and I
can't reproduce it on macOS this morning, though I suspect that has more
to do with native use of LF over CRLF than anything else).

--
Sean Allred

^ permalink raw reply

* Re: Concurrent fetch commands
From: Patrick Steinhardt @ 2024-01-03 10:40 UTC (permalink / raw)
  To: Oswald Buddenhagen; +Cc: Junio C Hamano, Stefan Haller, git
In-Reply-To: <ZZU1TCyQdLqoLxPw@ugly>

[-- Attachment #1: Type: text/plain, Size: 2292 bytes --]

On Wed, Jan 03, 2024 at 11:22:04AM +0100, Oswald Buddenhagen wrote:
> On Wed, Jan 03, 2024 at 09:11:07AM +0100, Patrick Steinhardt wrote:
> > Ah, one thing I didn't think of is parallel fetches. It's expected that
> > all of the fetches write into FETCH_HEAD at the same point in time
> > concurrently
> > 
> is it, though? given that the contents could be already randomly scrambled,
> it would not seem particularly bad if the behavior changed.
> 
> the one real complication i see is the --append option, which requires using
> a waiting lock after the actual fetch, rather than acquiring it immediately
> and erroring out on failure (and ideally giving a hint to use
> --no-write-fetch-head).

I should probably clarify, but with "parallel fetches" I meant `git
fetch --jobs=`, not two separate executions of git-fetch(1). And these
do in fact use `--append` internally: the main process first truncates
FETCH_HEAD and then spawns its children, which will then append to
FETCH_HEAD in indeterministic order.

But even though the order is indeterministic, I wouldn't go as far as
claiming that the complete feature is broken. It works and records all
updated refs in FETCH_HEAD just fine, even if it's not particularly
elegant. Which to me shows that we should try hard not to break it.

> an extra complication is that concurrent runs with and without --append
> should be precluded, because that would again result in undefined behavior.
> it generally seems tricky to get --append straight if parallel fetches are
> supposed to work.

Yeah, the `--append` flag indeed complicates things. There are two ways
to handle this:

  - `--append` should refrain from running when there is a lockfile.
    This breaks `git fetch --jobs` without extra infra to handle this
    case, and furthermore a user may (rightfully?) expect that two
    manually spawned `git fetch --append` processes should work just
    fine.

  - `--append` should handle concurrency just fine, that is it knows to
    append to a preexisting lockfile. This is messy though, and the
    original creator of the lockfile wouldn't know when it can commit it
    into place.

Both options are kind of ugly, so I'm less sure now whether lockfiles
are the way to go.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Junio C Hamano @ 2024-01-03 14:38 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Patrick Steinhardt, Taylor Blau, git, christian.couder
In-Reply-To: <CAOLa=ZS4OOAmyRvf4HH-c_3GvnVkh6zS2kD3hEhRZ7NZT-rvyA@mail.gmail.com>

Karthik Nayak <karthik.188@gmail.com> writes:

> The confusion was that I thought Junio was referring to using
>
>     $ git for-each-ref ""
>
> to print all refs under $GIT_DIR, while he was actually talking about
> "$GIT_DIR/refs/" directory.

I do not think you misunderstood me here, though.  

When you have your master branch (refs/heads/master), your v1.0 tag
(refs/tags/v1.0), and the usual pseudorefs, giving "refs" to "git
for-each-ref" would yield refs/heads/master and refs/tags/v1.0 but
not HEAD and others, simply because the pattern "refs" in

    $ git for-each-ref "refs"

works as a hierarchy prefix match.  You give "refs/heads" and you
get only your master branch but not tags or HEAD in such a
repository.  As a natural extension to that behaviour, an empty
string as a hierarchy prefix that matches everything would work
well: you'd get HEAD, refs/heads/master, and refs/tags/v1.0 as an
empty prefix would cover all of the hiearchies these three refs (and
pseudorefs if you had ORIG_HEAD and MERGE_HEAD there) live in.

In any case, it is not a very much interesting to define the syntax
to tell for-each-ref not to limit itself under "refs/".  My point
was that you do not need a special option for that, as shown above.

What is more interesting is what to do with refs that are specific
to other worktrees, e.g.

    $ git rev-parse "worktrees/$name/refs/bisect/bad"

would currently let you peek into (and worse yet, muck with, if you
really wanted to, with something like "git update-ref") refs that
should be only visible in another worktree.  Should for-each-ref and
friends learn a way to iterate over them?  I have no answer to that
question.

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Taylor Blau @ 2024-01-03 15:45 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, Junio C Hamano, git, christian.couder
In-Reply-To: <ZZUgUUlB8A-rhep5@tanuki>

On Wed, Jan 03, 2024 at 09:52:33AM +0100, Patrick Steinhardt wrote:
> > I tend to agree that the special empty pattern would be a good shorthand
> > for listing all references underneath refs/, including any top-level
> > psuedo-refs.
> >
> > But I don't think that I quite follow what Karthik is saying here.
> > for-each-ref returns the union of references that match the given
> > pattern(s), not their intersection. So if you wanted to list just the
> > psudo-refs ending in '_HEAD', you'd do:
> >
> >   $ git for-each-ref "*_HEAD"
> >
> > I think if you wanted to list all pseudo-refs, calling the option
> > `--pseudo-refs` seems reasonable. But if you want to list some subset of
> > psueod-refs matching a given pattern, you should specify that pattern
> > directly.
>
> Where I think this proposal falls short is if you have refs outside of
> the "refs/" hierarchy. Granted, this is nothing that should usually
> happen nowadays. But I think we should safeguard us for the future:

Hmm. Maybe I misspoke, but I was thinking that `--pseudo-refs` would
imply that we list all references (regardless of whether they appear in
the top-level refs/ hierarchy). But perhaps I'm misunderstanding what
you're trying to accomplish here.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Patrick Steinhardt @ 2024-01-03 15:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, Taylor Blau, git, christian.couder
In-Reply-To: <xmqqwmsqbhyt.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 4047 bytes --]

On Wed, Jan 03, 2024 at 06:38:02AM -0800, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
> 
> > The confusion was that I thought Junio was referring to using
> >
> >     $ git for-each-ref ""
> >
> > to print all refs under $GIT_DIR, while he was actually talking about
> > "$GIT_DIR/refs/" directory.
> 
> I do not think you misunderstood me here, though.  
> 
> When you have your master branch (refs/heads/master), your v1.0 tag
> (refs/tags/v1.0), and the usual pseudorefs, giving "refs" to "git
> for-each-ref" would yield refs/heads/master and refs/tags/v1.0 but
> not HEAD and others, simply because the pattern "refs" in
> 
>     $ git for-each-ref "refs"
> 
> works as a hierarchy prefix match.  You give "refs/heads" and you
> get only your master branch but not tags or HEAD in such a
> repository.  As a natural extension to that behaviour, an empty
> string as a hierarchy prefix that matches everything would work
> well: you'd get HEAD, refs/heads/master, and refs/tags/v1.0 as an
> empty prefix would cover all of the hiearchies these three refs (and
> pseudorefs if you had ORIG_HEAD and MERGE_HEAD there) live in.
> 
> In any case, it is not a very much interesting to define the syntax
> to tell for-each-ref not to limit itself under "refs/".  My point
> was that you do not need a special option for that, as shown above.

I think you're just stating that "it's possible, but not necessarily a
good idea" (let me know if I'm misinterpreting, I'm not quite sure
whether I read this correctly). Anyway, let me add my 2c here, even
though it may ultimately be completely moot.

The downside of an empty prefix is that you wouldn't be able to filter
refs outside of the "refs/" hierarchy in case we'd use the empty prefix.
A better alternative would be to use "/" as an indicator that you want
to list refs outside of "refs/". That'd allow for more flexible queries:

  - "/" prints all refs and pseudo refs, even those outside of the
    "refs/" hierarchy.

  - "/refs" prints your normal refs.

  - "/something/else" prints refs in "$GIT_DIR/something/else".

I'm not sure whether it's a better idea than using a flag and I'd assume
that the implementation would be more complex in that case because the
respective backends would need to special-case leading slashes.

So in the end I'm still in the camp that a flag is likely a better idea.

> What is more interesting is what to do with refs that are specific
> to other worktrees, e.g.
> 
>     $ git rev-parse "worktrees/$name/refs/bisect/bad"
> 
> would currently let you peek into (and worse yet, muck with, if you
> really wanted to, with something like "git update-ref") refs that
> should be only visible in another worktree.  Should for-each-ref and
> friends learn a way to iterate over them?  I have no answer to that
> question.

That's a good question indeed. I could certainly see an argument that
there should be the possibility to list them to get an allcompassing
view of the repository's refs. It's sure going to get more complex like
that though (which is not a good argument not to do this).

Currently, per-worktree refs are implemented as quasi-separate ref
stores (see `get_worktree_ref_store()`), and the reffiles backend will
indeed use completely separate stacks for each worktree. So ultimately
it makes me think that this is higher-level logic that the ref store
backend wouldn't need to be aware of, but that git-for-each-ref(1) or
related commands would need to handle.

So I'm not quite sure whether we should solve all these related problems
at once. If we were to implement these features via a flag then I could
see us using a value-flag with which you can control what exactly should
be included in the listing. So something like:

  - `--with-refs=repository` includes all refs of the current
    repository.

  - `--with-refs=worktrees` includes refs of all worktrees.

I dunno. I feel like I start to overthink this.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Patrick Steinhardt @ 2024-01-03 15:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Karthik Nayak, Junio C Hamano, git, christian.couder
In-Reply-To: <ZZWBLafB3pIlZqpw@nand.local>

[-- Attachment #1: Type: text/plain, Size: 1703 bytes --]

On Wed, Jan 03, 2024 at 10:45:49AM -0500, Taylor Blau wrote:
> On Wed, Jan 03, 2024 at 09:52:33AM +0100, Patrick Steinhardt wrote:
> > > I tend to agree that the special empty pattern would be a good shorthand
> > > for listing all references underneath refs/, including any top-level
> > > psuedo-refs.
> > >
> > > But I don't think that I quite follow what Karthik is saying here.
> > > for-each-ref returns the union of references that match the given
> > > pattern(s), not their intersection. So if you wanted to list just the
> > > psudo-refs ending in '_HEAD', you'd do:
> > >
> > >   $ git for-each-ref "*_HEAD"
> > >
> > > I think if you wanted to list all pseudo-refs, calling the option
> > > `--pseudo-refs` seems reasonable. But if you want to list some subset of
> > > psueod-refs matching a given pattern, you should specify that pattern
> > > directly.
> >
> > Where I think this proposal falls short is if you have refs outside of
> > the "refs/" hierarchy. Granted, this is nothing that should usually
> > happen nowadays. But I think we should safeguard us for the future:
> 
> Hmm. Maybe I misspoke, but I was thinking that `--pseudo-refs` would
> imply that we list all references (regardless of whether they appear in
> the top-level refs/ hierarchy). But perhaps I'm misunderstanding what
> you're trying to accomplish here.

Ah, okay. I think in that case it's simply a misunderstanding. To me a
pseudo-ref only includes refs that match `is_pseudoref_syntax()`, so
things like "HEAD", "ORIG_HEAD" or "MERGE_HEAD". So with that
understanding, a ref "something/outside/refs" would not be included,
but I'd very much like to see it listed.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Junio C Hamano @ 2024-01-03 16:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, Taylor Blau, git, christian.couder
In-Reply-To: <ZZWCXFghtql4i4YE@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> The downside of an empty prefix is that you wouldn't be able to filter
> refs outside of the "refs/" hierarchy in case we'd use the empty prefix.
> A better alternative would be to use "/" as an indicator that you want
> to list refs outside of "refs/". That'd allow for more flexible queries:
>
>   - "/" prints all refs and pseudo refs, even those outside of the
>     "refs/" hierarchy.
>
>   - "/refs" prints your normal refs.
>
>   - "/something/else" prints refs in "$GIT_DIR/something/else".

I do not get this at all, sorry.  What makes your "/" cover "refs/"
but not "something/"?  Unless you have some rule that special cases 
"/" to apply the "hierarchy prefix" matching rule unevenly, that is
not possible.  So you can easily lose the "/" all of your above
patterns share, go back to what I showed, and apply the morally
equivalent special case to an empty prefix and you'd be OK.

In any case, I do not think supporting anything other than
pseudorefs and HEAD outside "refs/" is a good idea to begin with
(the "worktrees/$name/" example), and requiring that all normal
references live inside "refs/" hierarchy is a good idea, so all of
the above is moot, I would say.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Patrick Steinhardt @ 2024-01-03 16:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, Taylor Blau, git, christian.couder
In-Reply-To: <xmqqsf3ebe1l.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2426 bytes --]

On Wed, Jan 03, 2024 at 08:02:46AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The downside of an empty prefix is that you wouldn't be able to filter
> > refs outside of the "refs/" hierarchy in case we'd use the empty prefix.
> > A better alternative would be to use "/" as an indicator that you want
> > to list refs outside of "refs/". That'd allow for more flexible queries:
> >
> >   - "/" prints all refs and pseudo refs, even those outside of the
> >     "refs/" hierarchy.
> >
> >   - "/refs" prints your normal refs.
> >
> >   - "/something/else" prints refs in "$GIT_DIR/something/else".
> 
> I do not get this at all, sorry.  What makes your "/" cover "refs/"
> but not "something/"? 

It does cover "something/". But...

> Unless you have some rule that special cases "/" to apply the
> "hierarchy prefix" matching rule unevenly, that is not possible.  So
> you can easily lose the "/" all of your above patterns share, go back
> to what I showed, and apply the morally equivalent special case to an
> empty prefix and you'd be OK.

... I think you're right -- I was arguing under the misassumption that
the typical rev-parse rules kicked in for git-for-each-ref(1) (e.g.
matching "heads/foo" to "refs/heads/foo"). But they don't, so my point
indeed becomes moot and I see what you're getting at now and agree with
you.

> In any case, I do not think supporting anything other than
> pseudorefs and HEAD outside "refs/" is a good idea to begin with
> (the "worktrees/$name/" example), and requiring that all normal
> references live inside "refs/" hierarchy is a good idea, so all of
> the above is moot, I would say.

Yeah, I'm on the same page: anything outside of "refs/" should not be
supported. But the problem is that tools like git-update-ref(1) don't
enforce this, so something like `git update-ref foo/bar HEAD` happily
creates "$GIT_DIR/foo/bar". And I bet there are other ways to write refs
at arbitrary paths.

With the files backend it's easy to see that this was created and can be
rectified. But with the reftable library you wouldn't be able to learn
about the existence of this ref at all if we ignored anything but
pseudo-refs and refs prefixed with "refs/".

So while I agree that we shouldn't endorse such refs, we need to at
least give an escape hatch in case such refs end up in the refdb anyway.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
From: Christian Couder @ 2024-01-03 16:33 UTC (permalink / raw)
  To: Zach FettersMoore; +Cc: Junio C Hamano, Zach FettersMoore via GitGitGadget, git
In-Reply-To: <CAEWN6q2XeDDLvSM-ik_-HVqpeyYZLWpPwoj2SUyB9L9NyMJPLw@mail.gmail.com>

(Sorry for replying only to Zach instead of everyone previously.)

On Wed, Dec 13, 2023 at 4:20 PM Zach FettersMoore
<zach.fetters@apollographql.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> >>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> >>> > Merge made by the 'ort' strategy.
> >>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
> >>>
> >>> Looking into this some it looks like it could be a bash config
> >>> difference? My machine always runs it all the way through vs
> >>> failing for recursion depth. Although that would also be an issue
> >>> which is solved by this fix.
> >>
> >> I use Ubuntu where /bin/sh is dash so my current guess is that dash
> >> might have a smaller recursion limit than bash.
> >
> > That sounds quite bad. Does it have to be recursive (iow, if we can
> > rewrite the logic to be iterative instead, that would be a much better
> > way to fix the issue)?
>
> I don't think an iterative vs recursive approach fixes this
> particular issue, the root of the issue this patch is fixing
> is that lots of commits from the history of subtrees not
> being acted upon are being processed when they don't need to
> be. So the iterative approach would likely resolve the
> recursion limit issue for some shells, but in my instance
> I don't see a recursion limit error, it just takes an
> extraordinary amount of time to run the split command
> because of all the unnecessary processing which needs to be
> avoided which this patch fixes.

Fixing possible recursion might be an improvement on top of your
patch. But without your patch the test case it describes would anyway
take a lot more time than seems necessary. So I agree that your patch
should definitely be merged anyway.

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
From: Junio C Hamano @ 2024-01-03 16:37 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, git
In-Reply-To: <20240103090152.GB1866508@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> It looks like this is the original version. I posted a v2 that took
> René's suggestion to swap out the awk for shell, but it got overlooked.
> I'm happy enough either way, but if we want to salvage that effort,
> here's a patch which could go on top:

Thanks.  I was happy enough with the old one and placed the updated
one on backburner.

A commit message that explains why this incremental update (i.e.,
rewrite from awk to a shell loop) is a good idea below does make it
worthwhile ;-)

> -- >8 --
> From: René Scharfe <l.s.r@web.de>
> Subject: [PATCH] t1006: prefer shell loop to awk for packed object sizes
>
> To compute the expected on-disk size of packed objects, we sort the
> output of show-index by pack offset and then compute the difference
> between adjacent entries using awk. This works but has a few readability
> problems:
>
>   1. Reading the index in pack order means don't find out the size of an
>      oid's entry until we see the _next_ entry. So we have to save it to
>      print later.
>
>      We can instead iterate in reverse order, so we compute each oid's
>      size as we see it.

If you go forward, you need "the end of the previous round" (which
is "the beginning of the current round") to be subtracted from "the
end of the current round".  If you go forward, you have to have "the
beginning of the previous round" (which is "the end of the current
round") from which you subtract "the beginning of the current round".

So from that point of view, the only difference is that you would
not be ready to emit in the first round, and you would need to emit
for the last entry after the loop.  Because we happen to have the
end of the last entry outside the loop, we can omit the awkwardness.

OK.  But iterating over a list backwards is a bit awkward ;-).

>   2. Since the awk invocation is inside a text_expect block, we can't
>      easily use single-quotes to hold the script. So we use
>      double-quotes, but then have to escape the dollar signs in the awk
>      script.

Yup.  The joy of shell quoting rules ;-)

> I gave René authorship since this was his cleverness, but obviously I
> wrote the commit message. Giving an explicit signoff would be nice,
> though.

Indeed.

>  t/t1006-cat-file.sh | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 0c2eafae65..5ea3326128 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1117,14 +1117,16 @@ test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
>  		while read idx
>  		do
>  			git show-index <"$idx" >idx.raw &&
> -			sort -n <idx.raw >idx.sorted &&
> +			sort -nr <idx.raw >idx.sorted &&
>  			packsz=$(test_file_size "${idx%.idx}.pack") &&
>  			end=$((packsz - rawsz)) &&
> -			awk -v end="$end" "
> -			  NR > 1 { print oid, \$1 - start }
> -			  { start = \$1; oid = \$2 }
> -			  END { print oid, end - start }
> -			" idx.sorted ||
> +			while read start oid rest
> +			do
> +				size=$((end - start)) &&
> +				end=$start &&
> +				echo "$oid $size" ||
> +				return 1
> +			done <idx.sorted ||
>  			return 1
>  		done
>  	} >expect.raw &&

This is totally unrelated tangent, but the way "show-index" gets
invoked in the above loop makes readers wonder how the caller found
out which $idx file to read.

Of course, the above loop sits downstream of a pipe

    find .git/objects/pack -type f -name \*.idx

which means that any user of "git show-index" must be intimately
familiar with how the object database is structured.  I wonder if we
want an extra layer of abstraction, similar to how the reference
database can have different backend implementation.

Anyway, will queue.  Thanks.

^ permalink raw reply

* Re: Concurrent fetch commands
From: Taylor Blau @ 2024-01-03 16:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Oswald Buddenhagen, Junio C Hamano, Stefan Haller, git
In-Reply-To: <ZZU5s4LKQF1NLgnC@tanuki>

On Wed, Jan 03, 2024 at 11:40:51AM +0100, Patrick Steinhardt wrote:
>   - `--append` should handle concurrency just fine, that is it knows to
>     append to a preexisting lockfile. This is messy though, and the
>     original creator of the lockfile wouldn't know when it can commit it
>     into place.
>
> Both options are kind of ugly, so I'm less sure now whether lockfiles
> are the way to go.

Interesting. Thinking a little bit about what you wrote here, I feel
like `--append[=<FETCH_HEAD>] would do what you need here. The creator
of the lockfile would commit it into place exactly when all children
have finished writing into the existing lockfile.

It seems like that could work, but I haven't poked around to figure out
whether or not that is the case. Regardless, supposing that it does
work, I wonder what users reasonably expect in the presence of multiple
'git fetch' operations. I suppose the answer is that they expect
concurrent fetches to be tolerated, but that the contents of FETCH_HEAD
(and of course the remote references) are consistent at the end of all
of the fetches.

Thanks,
Taylor

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
From: Taylor Blau @ 2024-01-03 16:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq5y0bcjpw.fsf@gitster.g>

On Tue, Jan 02, 2024 at 05:02:35PM -0800, Junio C Hamano wrote:
> * tb/merge-tree-write-pack (2023-10-23) 5 commits
>  - builtin/merge-tree.c: implement support for `--write-pack`
>  - bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
>  - bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
>  - bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
>  - bulk-checkin: extract abstract `bulk_checkin_source`
>
>  "git merge-tree" learned "--write-pack" to record its result
>  without creating loose objects.
>
>  Broken when an object created during a merge is needed to continue merge
>  cf. <CABPp-BEfy9VOvimP9==ry_rZXu=metOQ8s=_-XiG_Pdx9c06Ww@mail.gmail.com>
>  source: <cover.1698101088.git.me@ttaylorr.com>

Let's drop this one.

> * tb/pair-chunk-expect (2023-11-10) 8 commits
>  - midx: read `OOFF` chunk with `pair_chunk_expect()`
>  - midx: read `OIDL` chunk with `pair_chunk_expect()`
>  - commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
>  - commit-graph: read `GDAT` chunk with `pair_chunk_expect()`
>  - commit-graph: read `CDAT` chunk with `pair_chunk_expect()`
>  - commit-graph: read `OIDL` chunk with `pair_chunk_expect()`
>  - chunk-format: introduce `pair_chunk_expect()` helper
>  - Merge branch 'jk/chunk-bounds-more' into HEAD
>
>  Further code clean-up.
>
>  Needs review.
>  source: <cover.1699569246.git.me@ttaylorr.com>

This one is on my list of things to look at, but probably not something
that I'll get to urgently before I've had a chance to clear my holiday
backlog. If you don't mind keeping it, that's fine, but I won't be upset
if it's easier to drop from 'seen' in the meantime.

> * tb/path-filter-fix (2023-10-18) 17 commits
>  - bloom: introduce `deinit_bloom_filters()`
>  - commit-graph: reuse existing Bloom filters where possible
>  - object.h: fix mis-aligned flag bits table
>  - commit-graph: drop unnecessary `graph_read_bloom_data_context`
>  - commit-graph.c: unconditionally load Bloom filters
>  - bloom: prepare to discard incompatible Bloom filters
>  - bloom: annotate filters with hash version
>  - commit-graph: new filter ver. that fixes murmur3
>  - repo-settings: introduce commitgraph.changedPathsVersion
>  - t4216: test changed path filters with high bit paths
>  - t/helper/test-read-graph: implement `bloom-filters` mode
>  - bloom.h: make `load_bloom_filter_from_graph()` public
>  - t/helper/test-read-graph.c: extract `dump_graph_info()`
>  - gitformat-commit-graph: describe version 2 of BDAT
>  - commit-graph: ensure Bloom filters are read with consistent settings
>  - revision.c: consult Bloom filters for root commits
>  - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
>
>  The Bloom filter used for path limited history traversal was broken
>  on systems whose "char" is unsigned; update the implementation and
>  bump the format version to 2.
>
>  Expecting a reroll.
>  cf. <20231023202212.GA5470@szeder.dev>
>  source: <cover.1697653929.git.me@ttaylorr.com>

I was confused by this one, since I couldn't figure out which tests
Gábor was referring to here. I responded in [1], but haven't heard back
since the end of October.

I personally think that this is ready to go, and it would be nice to get
it out of the perpetual "cooking" state that it's in. So if Gábor is
around to reply and I'm indeed missing something, that would be great.
But in the meantime, I think that this is ready to go.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/ZUARCJ1MmqgXfS4i@nand.local/

^ permalink raw reply

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Taylor Blau @ 2024-01-03 17:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, Junio C Hamano, git, christian.couder
In-Reply-To: <ZZWCxIHf9ySEOWEJ@tanuki>

On Wed, Jan 03, 2024 at 04:52:36PM +0100, Patrick Steinhardt wrote:
> On Wed, Jan 03, 2024 at 10:45:49AM -0500, Taylor Blau wrote:
> > On Wed, Jan 03, 2024 at 09:52:33AM +0100, Patrick Steinhardt wrote:
> > > > I tend to agree that the special empty pattern would be a good shorthand
> > > > for listing all references underneath refs/, including any top-level
> > > > psuedo-refs.
> > > >
> > > > But I don't think that I quite follow what Karthik is saying here.
> > > > for-each-ref returns the union of references that match the given
> > > > pattern(s), not their intersection. So if you wanted to list just the
> > > > psudo-refs ending in '_HEAD', you'd do:
> > > >
> > > >   $ git for-each-ref "*_HEAD"
> > > >
> > > > I think if you wanted to list all pseudo-refs, calling the option
> > > > `--pseudo-refs` seems reasonable. But if you want to list some subset of
> > > > psueod-refs matching a given pattern, you should specify that pattern
> > > > directly.
> > >
> > > Where I think this proposal falls short is if you have refs outside of
> > > the "refs/" hierarchy. Granted, this is nothing that should usually
> > > happen nowadays. But I think we should safeguard us for the future:
> >
> > Hmm. Maybe I misspoke, but I was thinking that `--pseudo-refs` would
> > imply that we list all references (regardless of whether they appear in
> > the top-level refs/ hierarchy). But perhaps I'm misunderstanding what
> > you're trying to accomplish here.
>
> Ah, okay. I think in that case it's simply a misunderstanding. To me a
> pseudo-ref only includes refs that match `is_pseudoref_syntax()`, so
> things like "HEAD", "ORIG_HEAD" or "MERGE_HEAD". So with that
> understanding, a ref "something/outside/refs" would not be included,
> but I'd very much like to see it listed.

OK, I see: you're trying to add an option that lists all references
(including those outside of the top-level "refs/" hierarchy). But my
proposal to use `--pseudo-refs` was to list *just* those references
outside of the top-level hierarchy.

I wonder if we might want to do something else entirely, which is an
option which controls the top-level "namespace" of references that we
want to see. The behavior would then be to list all references under
"namespace" (which presumably would be "refs/" by default).

If you want to list references like something/outside/refs, your
namespace would then be --namespace="".

I think that this would be a bit more flexible than the current
suggestions, but I am also not as familiar as you are at this particular
problem :-).

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2] write-or-die: make GIT_FLUSH a Boolean environment variable
From: Junio C Hamano @ 2024-01-03 17:13 UTC (permalink / raw)
  To: Chandra Pratap via GitGitGadget; +Cc: git, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1628.v2.git.1704268708720.gitgitgadget@gmail.com>

"Chandra Pratap via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  Documentation/git.txt | 16 +++++++---------
>  write-or-die.c        |  9 +++------
>  2 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 2535a30194f..83fd60f2d11 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -724,16 +724,14 @@ for further details.
>  	waiting for someone with sufficient permissions to fix it.
>  
>  `GIT_FLUSH`::
> -// NEEDSWORK: make it into a usual Boolean environment variable
> -	If this environment variable is set to "1", then commands such
> +	If this Boolean environment variable is set to true, then commands such
>  	as 'git blame' (in incremental mode), 'git rev-list', 'git log',
> -	'git check-attr' and 'git check-ignore' will
> -	force a flush of the output stream after each record have been
> -	flushed. If this
> -	variable is set to "0", the output of these commands will be done
> -	using completely buffered I/O.   If this environment variable is
> -	not set, Git will choose buffered or record-oriented flushing
> -	based on whether stdout appears to be redirected to a file or not.
> +	'git check-attr' and 'git check-ignore' will force a flush of the output
> +	stream after each record have been flushed. If this variable is set to
> +	false, the output of these commands will be done using completely buffered
> +	I/O. If this environment variable is not set, Git will choose buffered or
> +	record-oriented flushing based on whether stdout appears to be redirected
> +	to a file or not.

It is somewhat irritating to see that we need to change this many
lines to just change "0" to "false" and "1" to "true".  I wonder if
it becomes easier to grok if we changed the description into a sub
enumeration of three possibilities, but that would be outside the
scope of this change [*].

> diff --git a/write-or-die.c b/write-or-die.c
> index 42a2dc73cd3..a6acabd329f 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -20,15 +20,12 @@ void maybe_flush_or_die(FILE *f, const char *desc)
>  {
>  	static int skip_stdout_flush = -1;
>  	struct stat st;
> -	char *cp;
>  
>  	if (f == stdout) {
>  		if (skip_stdout_flush < 0) {
> -			/* NEEDSWORK: make this a normal Boolean */
> -			cp = getenv("GIT_FLUSH");
> -			if (cp)
> -				skip_stdout_flush = (atoi(cp) == 0);
> -			else if ((fstat(fileno(stdout), &st) == 0) &&
> +			if (!git_env_bool("GIT_FLUSH", -1))
> +				skip_stdout_flush = 1;
> +			else if (!fstat(fileno(stdout), &st) &&
>  				 S_ISREG(st.st_mode))
>  				skip_stdout_flush = 1;
>  			else

The above logic does not look correct to me, primarily because the
return value of git_env_bool() is inspected only once to see if it
is zero, and does not differentiate the "unset" case from other
cases.

Since git_env_bool(k, def) returns

    - "def" (-1 in this case) when k is not exported (in which case
      you need to do the "fstat" dance).

    - 0 when k is exported and has a string that is "false" (in
      which case you would want to set skip_stdout_flush to true).

    - 1 when k is exported and has a string that is "true" (in which
      case you would want to set skip_stdout_flush to false).

    - or dies if the string exported in k is bogus.

shouldn't it be more like

                        skip_stdout_flush = 0; /* assume flushing */
                        switch (git_env_bool("GIT_FLUSH", -1)) {
                        case 0: /* told not to flush */
                                skip_stdout_flush = 1;
                                ;;
                        case -1: /* unspecified */
                                if (!fstat(...) && S_ISREG())
                                        skip_stdout_flush = 1;
                                ;;
                        default: /* told to flush */
                                ;;
                        }

perhaps?


[Footnote]

 * If I were to do this change, and if I were to also improve the
   style of the documentation before I forget, the way I would do so
   probably is with a two-patch series:

    (1) update "0" and "1" in the documentation with "false" and
        "true", without reflowing the text at all, and update the
        code.

    (2) rewrite the documentation to use 3-possibility
        sub-enumeration for values (imitate the way how other
        variables, like `diff.algorithm`, that can choose from a
        set of handful possible values are described).

   These two changes can be done in either order, but perhaps (1) is
   much less controversial change than the other, so I'd probably do
   so first.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox