* [PATCH v2 9/9] SubmittingPatches: hyphenate non-ASCII
From: Josh Soref via GitGitGadget @ 2023-12-21 16:41 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
Git documentation does this with the exception of ancient release notes.
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/SubmittingPatches | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index cb0dcce6a17..9283eb0ef71 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -699,7 +699,7 @@ message to an external program, and this is a handy way to drive
`git am`. However, if the message is MIME encoded, what is
piped into the program is the representation you see in your
`*Article*` buffer after unwrapping MIME. This is often not what
-you would want for two reasons. It tends to screw up non ASCII
+you would want for two reasons. It tends to screw up non-ASCII
characters (most notably in people's names), and also
whitespaces (fatal in patches). Running "C-u g" to display the
message in raw form before using "|" to run the pipe can work
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 8/9] SubmittingPatches: clarify GitHub artifact format
From: Josh Soref via GitGitGadget @ 2023-12-21 16:41 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
GitHub wraps artifacts generated by workflows in a .zip file.
Internally, workflows can package anything they like in them.
A recently generated failure artifact had the form:
windows-artifacts.zip
Length Date Time Name
--------- ---------- ----- ----
76001695 12-19-2023 01:35 artifacts.tar.gz
11005650 12-19-2023 01:35 tracked.tar.gz
--------- -------
87007345 2 files
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/SubmittingPatches | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 8f79253c5cb..cb0dcce6a17 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -605,7 +605,8 @@ branches here: `https://github.com/<Your GitHub handle>/git/actions/workflows/ma
If a branch did not pass all test cases then it is marked with a red
+x+. In that case you can click on the failing job and navigate to
"ci/run-build-and-tests.sh" and/or "ci/print-test-failures.sh". You
-can also download "Artifacts" which are tarred (or zipped) archives
+can also download "Artifacts" which are zip archives containing
+tarred (or zipped) archives
with test data relevant for debugging.
Then fix the problem and push your fix to your GitHub fork. This will
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 7/9] SubmittingPatches: clarify GitHub visual
From: Josh Soref via GitGitGadget @ 2023-12-21 16:41 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
GitHub has two general forms for its states, sometimes they're a simple
colored object (e.g. green check or red x), and sometimes there's also a
colored container (e.g. green box or red circle) with containing that
object (e.g. check or x).
That's a lot of words to try to describe things, but in general, the key
for a failure is that it's recognized as an `x` and that it's associated
with the color red -- the color of course is problematic for people who
are red-green color-blind, but that's why they are paired with distinct
shapes.
Using the term `cross` doesn't really help.
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/SubmittingPatches | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 4476b52a50f..8f79253c5cb 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -603,7 +603,7 @@ to your fork of Git on GitHub. You can monitor the test state of all your
branches here: `https://github.com/<Your GitHub handle>/git/actions/workflows/main.yml`
If a branch did not pass all test cases then it is marked with a red
-cross. In that case you can click on the failing job and navigate to
++x+. In that case you can click on the failing job and navigate to
"ci/run-build-and-tests.sh" and/or "ci/print-test-failures.sh". You
can also download "Artifacts" which are tarred (or zipped) archives
with test data relevant for debugging.
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 6/9] SubmittingPatches: improve extra tags advice
From: Josh Soref via GitGitGadget @ 2023-12-21 16:41 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
Current statistics show a strong preference to only capitalize the first
letter in a hyphenated tag, but that some guidance would be helpful:
git log |
perl -ne 'next unless /^\s+(?:Signed-[oO]ff|Acked)-[bB]y:/;
s/^\s+//;s/:.*/:/;print'|
sort|uniq -c|sort -n
2 Signed-off-By:
4 Signed-Off-by:
22 Acked-By:
47 Signed-Off-By:
2202 Acked-by:
95315 Signed-off-by:
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/SubmittingPatches | 3 +++
1 file changed, 3 insertions(+)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 31878cb70b7..4476b52a50f 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -368,6 +368,9 @@ While you can also create your own trailer if the situation warrants it, we
encourage you to instead use one of the common trailers in this project
highlighted above.
+Extra tags should only capitalize the very first letter, i.e. favor
+"Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
+
[[git-tools]]
=== Generate your patch using Git tools out of your commits.
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 5/9] SubmittingPatches: update extra tags list
From: Josh Soref via GitGitGadget @ 2023-12-21 16:41 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
Add items with at least 100 uses in the past three years:
- Co-authored-by
- Helped-by
- Mentored-by
- Suggested-by
git log --since=3.years|
perl -ne 'next unless /^\s+[A-Z][a-z]+-\S+:/;s/^\s+//;s/:.*/:/;print'|
sort|uniq -c|sort -n|grep '[0-9][0-9] '
14 Based-on-patch-by:
14 Original-patch-by:
17 Tested-by:
100 Suggested-by:
121 Co-authored-by:
163 Mentored-by:
274 Reported-by:
290 Acked-by:
450 Helped-by:
602 Reviewed-by:
14111 Signed-off-by:
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/SubmittingPatches | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 58dfe405049..31878cb70b7 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -355,6 +355,14 @@ If you like, you can put extra tags at the end:
patch after a detailed analysis.
. `Tested-by:` is used to indicate that the person applied the patch
and found it to have the desired effect.
+. `Co-authored-by:` is used to indicate that people exchanged drafts
+ of a patch before submitting it.
+. `Helped-by:` is used to credit someone who suggested ideas for
+ changes without providing the precise changes in patch form.
+. `Mentored-by:` is used to credit someone with helping develop a
+ patch as part of a mentorship program (e.g., GSoC or Outreachy).
+. `Suggested-by:` is used to credit someone with suggesting the idea
+ for a patch.
While you can also create your own trailer if the situation warrants it, we
encourage you to instead use one of the common trailers in this project
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 4/9] SubmittingPatches: discourage new trailers
From: Josh Soref via GitGitGadget @ 2023-12-21 16:41 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
There seems to be consensus amongst the core Git community on a working
set of common trailers, and there are non-trivial costs to people
inventing new trailers (research to discover what they mean/how they
differ from existing trailers) such that inventing new ones is generally
unwarranted and not something to be recommended to new contributors.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/SubmittingPatches | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 32e90238777..58dfe405049 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -356,8 +356,9 @@ If you like, you can put extra tags at the end:
. `Tested-by:` is used to indicate that the person applied the patch
and found it to have the desired effect.
-You can also create your own tag or use one that's in common usage
-such as "Thanks-to:", "Based-on-patch-by:", or "Mentored-by:".
+While you can also create your own trailer if the situation warrants it, we
+encourage you to instead use one of the common trailers in this project
+highlighted above.
[[git-tools]]
=== Generate your patch using Git tools out of your commits.
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 3/9] SubmittingPatches: drop ref to "What's in git.git"
From: Josh Soref via GitGitGadget @ 2023-12-21 16:40 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
"What's in git.git" was last seen in 2010:
https://lore.kernel.org/git/?q=%22what%27s+in+git.git%22
https://lore.kernel.org/git/7vaavikg72.fsf@alter.siamese.dyndns.org/
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/SubmittingPatches | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index bce7f97815c..32e90238777 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -570,7 +570,7 @@ their trees themselves.
master).
* Read the Git mailing list, the maintainer regularly posts messages
- entitled "What's cooking in git.git" and "What's in git.git" giving
+ entitled "What's cooking in git.git" giving
the status of various proposed changes.
== GitHub CI[[GHCI]]
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 2/9] CodingGuidelines: write punctuation marks
From: Josh Soref via GitGitGadget @ 2023-12-21 16:40 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
- Match style in Release Notes
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/CodingGuidelines | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index af94ed3a75d..578587a4715 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -578,7 +578,7 @@ Externally Visible Names
. The variable name describes the effect of tweaking this knob.
The section and variable names that consist of multiple words are
- formed by concatenating the words without punctuations (e.g. `-`),
+ formed by concatenating the words without punctuation marks (e.g. `-`),
and are broken using bumpyCaps in documentation as a hint to the
reader.
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 1/9] CodingGuidelines: move period inside parentheses
From: Josh Soref via GitGitGadget @ 2023-12-21 16:40 UTC (permalink / raw)
To: git
Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref, Josh Soref
In-Reply-To: <pull.1623.v2.git.1703176865.gitgitgadget@gmail.com>
From: Josh Soref <jsoref@gmail.com>
The contents within parenthesis should be omittable without resulting
in broken text.
Eliding the parenthesis left a period to end a run without any content.
Signed-off-by: Josh Soref <jsoref@gmail.com>
---
Documentation/CodingGuidelines | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 8ed517a5ca0..af94ed3a75d 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -450,7 +450,7 @@ For C programs:
one of the approved headers that includes it first for you. (The
approved headers currently include "builtin.h",
"t/helper/test-tool.h", "xdiff/xinclude.h", or
- "reftable/system.h"). You do not have to include more than one of
+ "reftable/system.h".) You do not have to include more than one of
these.
- A C file must directly include the header files that declare the
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 0/9] Minor improvements to CodingGuidelines and SubmittingPatches
From: Josh Soref via GitGitGadget @ 2023-12-21 16:40 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, René Scharfe, Phillip Wood, Josh Soref,
Josh Soref
In-Reply-To: <pull.1623.git.1702975319.gitgitgadget@gmail.com>
These are a bunch of things I've run into over my past couple of attempts to
contribute to Git.
* Incremental punctuation/grammatical improvements
* Update extra tags suggestions based on common usage
* drop reference to an article that was discontinued over a decade ago
* update GitHub references
* harmonize non-ASCII while I'm here
Note that I'm trying to do things "in the neighborhood". It'll be slower
than me replacing things topically, but hopefully easier for others to
digest. My current estimate is a decade or two :).
Josh Soref (9):
CodingGuidelines: move period inside parentheses
CodingGuidelines: write punctuation marks
SubmittingPatches: drop ref to "What's in git.git"
SubmittingPatches: discourage new trailers
SubmittingPatches: update extra tags list
SubmittingPatches: improve extra tags advice
SubmittingPatches: clarify GitHub visual
SubmittingPatches: clarify GitHub artifact format
SubmittingPatches: hyphenate non-ASCII
Documentation/CodingGuidelines | 4 ++--
Documentation/SubmittingPatches | 27 ++++++++++++++++++++-------
2 files changed, 22 insertions(+), 9 deletions(-)
base-commit: 624eb90fa8f65a79396615f3c2842ac5a3743350
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1623%2Fjsoref%2Fdocumentation-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1623/jsoref/documentation-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1623
Range-diff vs v1:
1: b9a8eb6aa4e = 1: b9a8eb6aa4e CodingGuidelines: move period inside parentheses
2: c0db8336e51 = 2: c0db8336e51 CodingGuidelines: write punctuation marks
3: 22d66c5b78a = 3: 22d66c5b78a SubmittingPatches: drop ref to "What's in git.git"
4: e5c7f29af43 ! 4: eac2211332f SubmittingPatches: update extra tags list
@@ Metadata
Author: Josh Soref <jsoref@gmail.com>
## Commit message ##
- SubmittingPatches: update extra tags list
+ SubmittingPatches: discourage new trailers
- Add items with at least 100 uses:
- - Co-authored-by
- - Helped-by
- - Mentored-by
- - Suggested-by
-
- Updating the create suggestion to something less commonly used.
-
- git log |
- perl -ne 'next unless /^\s+[A-Z][a-z]+-\S+:/;s/^\s+//;s/:.*/:/;print'|
- sort|uniq -c|sort -n|grep '[0-9][0-9] '
- 11 Helped-By:
- 13 Message-ID:
- 14 Reported-By:
- 22 Acked-By:
- 27 Inspired-by:
- 29 Requested-by:
- 35 Original-patch-by:
- 43 Contributions-by:
- 47 Signed-Off-By:
- 65 Based-on-patch-by:
- 68 Thanks-to:
- 88 Improved-by:
- 145 Co-authored-by:
- 171 Noticed-by:
- 182 Tested-by:
- 361 Suggested-by:
- 469 Mentored-by:
- 1196 Reported-by:
- 1727 Helped-by:
- 2177 Reviewed-by:
- 2202 Acked-by:
- 95313 Signed-off-by:
+ There seems to be consensus amongst the core Git community on a working
+ set of common trailers, and there are non-trivial costs to people
+ inventing new trailers (research to discover what they mean/how they
+ differ from existing trailers) such that inventing new ones is generally
+ unwarranted and not something to be recommended to new contributors.
+ Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Josh Soref <jsoref@gmail.com>
## Documentation/SubmittingPatches ##
@@ Documentation/SubmittingPatches: If you like, you can put extra tags at the end:
-
- . `Reported-by:` is used to credit someone who found the bug that
- the patch attempts to fix.
-+. `Noticed-by:` liked `Reported-by:` indicates someone who noticed
-+ the item being fixed.
- . `Acked-by:` says that the person who is more familiar with the area
- the patch attempts to modify liked the patch.
- . `Reviewed-by:`, unlike the other tags, can only be offered by the
-@@ Documentation/SubmittingPatches: If you like, you can put extra tags at the end:
- patch after a detailed analysis.
. `Tested-by:` is used to indicate that the person applied the patch
and found it to have the desired effect.
-+. `Co-authored-by:` is used to indicate that multiple people
-+ contributed to the work of a patch.
-+. `Helped-by:` is used to credit someone with helping develop a
-+ patch.
-+. `Mentored-by:` is used to credit someone with helping develop a
-+ patch.
-+. `Suggested-by:` is used to credit someone with suggesting the idea
-+ for a patch.
- You can also create your own tag or use one that's in common usage
+-You can also create your own tag or use one that's in common usage
-such as "Thanks-to:", "Based-on-patch-by:", or "Mentored-by:".
-+such as "Thanks-to:", "Based-on-patch-by:", or "Improved-by:".
++While you can also create your own trailer if the situation warrants it, we
++encourage you to instead use one of the common trailers in this project
++highlighted above.
[[git-tools]]
=== Generate your patch using Git tools out of your commits.
-: ----------- > 5: 8848572fe2c SubmittingPatches: update extra tags list
5: 11688e4360c ! 6: 8f16c7caa73 SubmittingPatches: improve extra tags advice
@@ Commit message
Signed-off-by: Josh Soref <jsoref@gmail.com>
## Documentation/SubmittingPatches ##
-@@ Documentation/SubmittingPatches: If you like, you can put extra tags at the end:
- You can also create your own tag or use one that's in common usage
- such as "Thanks-to:", "Based-on-patch-by:", or "Improved-by:".
+@@ Documentation/SubmittingPatches: While you can also create your own trailer if the situation warrants it, we
+ encourage you to instead use one of the common trailers in this project
+ highlighted above.
+Extra tags should only capitalize the very first letter, i.e. favor
+"Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
6: 043d2a24202 ! 7: cdb5fd0957f SubmittingPatches: clarify GitHub visual
@@ Metadata
## Commit message ##
SubmittingPatches: clarify GitHub visual
- Some people would expect a cross to be upright, and potentially have
- unequal lengths...
+ GitHub has two general forms for its states, sometimes they're a simple
+ colored object (e.g. green check or red x), and sometimes there's also a
+ colored container (e.g. green box or red circle) with containing that
+ object (e.g. check or x).
- GitHub uses a white x overlaying a solid red circle to indicate failure.
+ That's a lot of words to try to describe things, but in general, the key
+ for a failure is that it's recognized as an `x` and that it's associated
+ with the color red -- the color of course is problematic for people who
+ are red-green color-blind, but that's why they are paired with distinct
+ shapes.
+
+ Using the term `cross` doesn't really help.
Signed-off-by: Josh Soref <jsoref@gmail.com>
7: cdab65a4259 ! 8: 77576327df8 SubmittingPatches: clarify GitHub artifact format
@@ Commit message
GitHub wraps artifacts generated by workflows in a .zip file.
- Internally workflows can package anything they like in them.
+ Internally, workflows can package anything they like in them.
A recently generated failure artifact had the form:
8: 92469324813 = 9: a4878f58fe4 SubmittingPatches: hyphenate non-ASCII
--
gitgitgadget
^ permalink raw reply
* Re: [PATCH 0/1] revert/cherry-pick: add --show-current-patch option
From: phillip.wood123 @ 2023-12-21 16:32 UTC (permalink / raw)
To: Michael Lohmann; +Cc: git, mi.al.lohmann, newren, phillip.wood
In-Reply-To: <20231220065141.7599-1-mi.al.lohmann@gmail.com>
Hi Michael
On 20/12/2023 06:51, Michael Lohmann wrote:
> Hi Phillip
>
> On 18/12/2023 16:42, Phillip Wood wrote:
>> Thanks for bringing this up I agree it can be very helpful to look at
>> the original commit when resolving cherry-pick and revert conflicts.
As an aside I find it useful is to do a kind of range-diff before
committing the conflict resolution. Unfortunately one cannot use "git
range-diff" because the conflict resolution is not yet committed.
Instead I use
diff <(git diff CHERRY_PICK_HEAD^-) <(git diff HEAD)
in practice it is helpful to pipe the diffs through sed to delete the
"index" lines and normalize the hunk headers.
>> I'm in two minds about this change though - I wonder if it'd be better
>> to improve the documentation for CHERRY_PICK_HEAD and REVERT_HEAD and
>> tell users to run "git show CHERRY_PICK_HEAD" instead. I think the
>> main reason we have a "--show-current-patch" option for "rebase" is
>> that there are two different implementations of that command and the
>> patched-based one of them does not support REBASE_HEAD. That reasoning
>> does not apply to "cherry-pick" and "revert" and
>> "--show-current-patch" suggests a patch-based implementation which is
>> also not the case for these commands.
>
> I appreciate the urge of limiting the interface to the minimum needed
> and not to duplicate functionality that already exists. On the other
> hand, this would
> a) grant the user the same experience, not having to wonder about
> implementation details such as different backends for rebase, but not
> for revert/cherry-pick and
> b) (I know it is more indicative of me, but:) when I am looking for a
> feature in software and I look into the respective man page I tend to
> focus first on the synopsis instead of reading the whole page (or
> sometimes I even just rely on the shell autocompletion for
> discoverability).
>
> So yes, mentioning REVERT_HEAD and CHERRY_PICK_HEAD in the respective
> docs would technically be sufficient, but I don't think it is as
> discoverable to an average user (who does not know about the details of
> all the existing pseudo refs) as a toplevel action would be. But an
> assessment of the pros and cons is not on me to decide.
To make the psuedo refs discoverable we should certainly be mentioning
them in the section about resolving conflicts. I haven't checked what
the docs say at the moment but a worked example showing how to inspect
the conflicts and the original changes would be helpful I think. That
does assume that the user actually reads the section about resolving
conflicts rather than just scanning the available command line options
though.
> I have to be honest: I have troubles distinguishing a "patch" and a
> "diff", the latter of which `git show <commit>` shows according to the
> documentation ("For commits it shows the log message and textual
> diff."), though my understanding was that a patch is a diff + context
> lines, which is what `git show` actually shows... I think this is
> probably why I don't feel so strong about the potential loose usage of
> the word here.
I think for the purposes of this discussion "patch" and "diff" are
largely interchangeable (a "patch" is essentially a "diff" with a commit
message). Maybe I'm overthinking it but the reason I'm not very keen on
"--show-current-patch" (in addition to the "duplicate functionality"
argument you mention above) is that cherry-pick and revert do not work
by applying patches (or diffs) but use a 3-way merge instead. I think
--show-current-patch first appeared as an option to "git am" which makes
sense as that command is all about applying patches.
I'd be interested to hear what other people think about whether it makes
"--show-current-patch" make sense for other commands.
> Also the documentation of cherry-pick already uses the word "patch" in a
> (according to my understanding from a technical perspective) sloppy (but
> from a layman's point of view probably nevertheless helpful) way:
>
>> The following sequence attempts to backport a patch, bails out because
>> the code the patch applies to has changed too much, and then tries
>> again, this time exercising more care about matching up context lines.
>>
>> ------------
>> $ git cherry-pick topic^ <1>
>> $ git diff <2>
>> $ git cherry-pick --abort <3>
>> $ git cherry-pick -Xpatience topic^ <4>
>> ------------
>> <1> apply the change that would be shown by `git show topic^`.
>> In this example, the patch does not apply cleanly, so
>> information about the conflict is written to the index and
>> working tree and no new commit results.
>
> Should that also be rephrased?
It would certainly be more accurate for the first paragraph to say
something like
The following sequence tries to backport a commit. It bails out
because the code modified by the commit has conflicting changes in
the current branch.
The bit about exercising more care about matching up context lines is
moot these days as the default merge strategy is "ort" which uses the
histogram diff algorithm to do just that so commands <3> & <4> should
not be needed.
>
> Out of curiosity: The following from the rebase docs seems to imply that
> the apply backend will probably be removed in the future:
>> --apply
>> Use applying strategies to rebase (calling git-am
>> internally). This option may become a no-op in the future
>> once the merge backend handles everything the apply one
>> does.
>
> But I would expect the `rebase --show-current-patch` still to be
> working. Would that only be a legacy compatibility flag and instead also
> for rebases the recommended option would be to run
> `git show REBASE_HEAD`?
The long term goal is to remove the apply backend but I don't think
anyone is actively working on it at the moment. We'd certainly need to
keep the --show-current-patch option for backwards compatibility.
I'll be off the list for the next couple of weeks but I'll be sure to
catch up with this thread in the New Year
Best Wishes
Phillip
^ permalink raw reply
* Re: [PATCH 4/8] SubmittingPatches: update extra tags list
From: phillip.wood123 @ 2023-12-21 15:09 UTC (permalink / raw)
To: Josh Soref, Elijah Newren; +Cc: git, phillip.wood
In-Reply-To: <CACZqfqCar=tay5diocU7jVWBwPUFqewNYfPLHkYvvR1fSBSdPA@mail.gmail.com>
Hi Josh
On 20/12/2023 17:42, Josh Soref wrote:
> SubmittingPatches: discourage new trailers
>
> There seems to be consensus amongst the core Git community on a working
> set of common trailers, and there are non-trivial costs to people
> inventing new trailers (research to discover what they mean/how they
> differ from existing trailers) such that inventing new ones is generally
> unwarranted and not something to be recommended to new contributors.
>
> Suggested-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Josh Soref <jsoref@gmail.com>
>
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 32e9023877..58dfe40504 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -358,4 +358,5 @@ If you like, you can put extra tags at the end:
>
> -You can also create your own tag or use one that's in common usage
> -such as "Thanks-to:", "Based-on-patch-by:", or "Mentored-by:".
> +While you can also create your own trailer if the situation warrants it, we
> +encourage you to instead use one of the common trailers in this project
> +highlighted above.
Thanks for this, it looks good and would be a useful addition to v2 of
your patch series.
Best Wishes
Phillip
^ permalink raw reply
* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: phillip.wood123 @ 2023-12-21 15:06 UTC (permalink / raw)
To: Jeff King, phillip.wood
Cc: René Scharfe, AtariDreams via GitGitGadget, git, Seija Kijin
In-Reply-To: <20231221095606.GB570888@coredump.intra.peff.net>
Hi Peff
On 21/12/2023 09:56, Jeff King wrote:
> On Fri, Dec 15, 2023 at 02:46:36PM +0000, Phillip Wood wrote:
>
>>> Yeah. b2() is wrong for passing "2" to a bool.
>>
>> I think it depends what you mean by "wrong" §6.3.1.2 of standard is quite
>> clear that when any non-zero scalar value is converted to _Bool the result
>> is "1"
>
> Yeah, sorry, I was being quite sloppy with my wording. I meant "wrong"
> as in "I would ideally flag this in review for being weird and
> confusing".
That makes sense, it certainly is confusing
> Of course there are many reasonable cases where you might pass an
> integer "foo" rather than explicitly booleanizing it with "!!foo". So I
> do agree it's a real potential problem (and I'm sufficiently convinced
> that we should avoid an "int" fallback if we can).
I had a look at gnulib the other day and the list of limitations in the
documentation of their <stdbool.h> fallback makes it look quite
unattractive. They helpfully list some compilers where _Bool is not
implemented (IRIX, Tru64) or does not work correctly (HP-UX, AIX). As
far as I can see all the bug reports cited are from 2003-2006 on
obsolete compiler versions, hopefully _Bool is better supported these days.
Best Wishes
Phillip
> -Peff
^ permalink raw reply
* Re: git bisect stuck - --force flag required for checkout
From: brian m. carlson @ 2023-12-21 12:22 UTC (permalink / raw)
To: Devste Devste; +Cc: git
In-Reply-To: <CANM0SV3SEF28QJ2V0Q9ydp8yDbL8TDc1m871oxOB=UtwF1TtxQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]
On 2023-12-21 at 10:47:57, Devste Devste wrote:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
> add file Foo.txt to .git and commit
> add some commits with any changes to other files, as this is needed
> for reproduction
> run: git config core.ignorecase false
`core.ignorecase` is specifically designed for this case. It's set
internally by Git when the repository is created, and it's not supposed
to be changed by the user.
If you want a repository where there's no case sensitivity, then I'd
recommend WSL. It's also possible to make some directories case
sensitive in Windows 10 and newer and allegedly that works recursively,
so you could use `fsutil` to do that, then run `git init`, then add
data.
> rename Foo.txt to foo.txt and commit
> add some commits with any changes to other files, as this is needed
> for reproduction
> run: git bisect start && git bisect bad
> eventually, when running "git bisect good" (or bad) you will get an error:
> >error: The following untracked working tree files would be overwritten by checkout:
> >Foo.php
>
> Anything else you want to add:
> git bisect good/bad needs to have support for a "--force" flag, which
> is passed to the git checkout it runs internally
> At the moment git bisect cannot be used on Windows, as there is no way
> to continue the bisect from here.
> Changing the "git config core.ignorecase true" temporarily is not an
> option, as this will introduce a variety of other bugs,
> which, on Windows, eventually will require you to completely delete
> and reclone the repo, as Windows file paths are case-insensitive
Could you share what those problems are? `core.ignorecase` is
specifically designed to deal with case-insensitive file systems, and
that's why Git sets it to true.
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply
* Re: [PATCH] t1006: add tests for %(objectsize:disk)
From: René Scharfe @ 2023-12-21 12:19 UTC (permalink / raw)
To: Jeff King; +Cc: git, Ondrej Pohorelsky, brian m . carlson, Junio C Hamano
In-Reply-To: <20231221094722.GA570888@coredump.intra.peff.net>
Am 21.12.23 um 10:47 schrieb Jeff King:
> On Tue, Dec 19, 2023 at 05:42:39PM +0100, René Scharfe wrote:
>
>>> This adds a packed-object function, but I doubt anybody actually calls
>>> it. If we're going to do that, it's probably worth adding some tests for
>>> "cat-file --batch-check" or similar.
>>
>> Yes, and I was assuming that someone else would be eager to add such
>> tests. *ahem*
>
> :P OK, here it is. This can be its own topic, or go on top of the
> rs/t6300-compressed-size-fix branch.
Great, thank you!
> -- >8 --
> Subject: [PATCH] t1006: add tests for %(objectsize:disk)
>
> Back when we added this placeholder in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), there were no tests,
> claiming "[...]the exact numbers returned are volatile and subject to
> zlib and packing decisions".
>
> But we can use a little shell hackery to get the expected numbers
> ourselves. To a certain degree this is just re-implementing what Git is
> doing under the hood, but it is still worth doing. It makes sure we
> exercise the %(objectsize:disk) code at all, and having the two
> implementations agree gives us more confidence.
>
> Note that our shell code assumes that no object appears twice (either in
> two packs, or as both loose and packed), as then the results really are
> undefined. That's OK for our purposes, and the test will notice if that
> assumption is violated (the shell version would produce duplicate lines
> that Git's output does not have).
>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I stole a bit of your awk. You can tell because I'd have written it in
> perl. ;)
I think we can do it even in shell, especially if...
>
> t/t1006-cat-file.sh | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 271c5e4fd3..21915be308 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1100,6 +1100,40 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
> cmp expect actual
> '
>
> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> + # our state has both loose and packed objects,
> + # so find both for our expected output
> + {
> + find .git/objects/?? -type f |
> + awk -F/ "{ print \$0, \$3\$4 }" |
> + while read path oid
> + do
> + size=$(test_file_size "$path") &&
> + echo "$oid $size" ||
> + return 1
> + done &&
> + rawsz=$(test_oid rawsz) &&
> + find .git/objects/pack -name "*.idx" |
> + while read idx
> + do
> + git show-index <"$idx" >idx.raw &&
> + sort -n <idx.raw >idx.sorted &&
> + packsz=$(test_file_size "${idx%.idx}.pack") &&
> + end=$((packsz - rawsz)) &&
> + awk -v end="$end" "
> + NR > 1 { print oid, \$1 - start }
> + { start = \$1; oid = \$2 }
> + END { print oid, end - start }
> + " idx.sorted ||
... we stop slicing the data against the grain. Let's reverse the order
(sort -r), then we don't need to carry the oid forward:
sort -nr <idx.raw >idx.sorted &&
packsz=$(test_file_size "${idx%.idx}.pack") &&
end=$((packsz - rawsz)) &&
awk -v end="$end" "
{ print \$2, end - \$1; end = \$1 }
" idx.sorted ||
And at that point it should be easy to use a shell loop instead of awk:
while read start oid rest
do
size=$((end - start)) &&
end=$start &&
echo "$oid $size" ||
return 1
done <idx.sorted
> + return 1
> + done
> + } >expect.raw &&
> + sort <expect.raw >expect &&
The reversal above becomes irrelevant with that line, so the result in
expect stays the same.
Should we deduplicate here, like cat-file does (i.e. use "sort -u")?
Having the same object in multiple places for whatever reason would not
be a cause for reporting an error in this test, I would think.
> + git cat-file --batch-all-objects \
> + --batch-check="%(objectname) %(objectsize:disk)" >actual &&
> + test_cmp expect actual
> +'
> +
> test_expect_success 'set up replacement object' '
> orig=$(git rev-parse HEAD) &&
> git cat-file commit $orig >orig &&
One more thing: We can do the work of the first awk invocation in the
already existing loop as well:
> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> + # our state has both loose and packed objects,
> + # so find both for our expected output
> + {
> + find .git/objects/?? -type f |
> + awk -F/ "{ print \$0, \$3\$4 }" |
> + while read path oid
> + do
> + size=$(test_file_size "$path") &&
> + echo "$oid $size" ||
> + return 1
> + done &&
... but the substitutions are a bit awkward:
find .git/objects/?? -type f |
while read path
do
basename=${path##*/} &&
dirname=${path%/$basename} &&
oid="${dirname#.git/objects/}${basename}" &&
size=$(test_file_size "$path") &&
echo "$oid $size" ||
return 1
done &&
The avoided awk invocation might be worth the trouble, though.
René
^ permalink raw reply
* Re: Git mirror at gitlab
From: Patrick Steinhardt @ 2023-12-21 11:48 UTC (permalink / raw)
To: Olliver Schinagl
Cc: git, gitster, Ævar Arnfjörð Bjarmason, psteinhardt
In-Reply-To: <2a833bfc-a075-4e78-ae6c-270f5198d498@schinagl.nl>
[-- Attachment #1: Type: text/plain, Size: 3239 bytes --]
On Thu, Dec 21, 2023 at 12:30:02PM +0100, Olliver Schinagl wrote:
> Hey list,
>
> For years, I wanted (tried, but time) to run the mirror for git on gitlab.
> Actually, the original idea was to run a docker container ([0] 10k+ pulls
> :p)
>
> I initially set this up via docker build containers, where docker hub would
> pull my mirror of the git repo. My mirror, because I added a Dockerfile
> which was enough for docker to do its trick. I was planning (time ..) on
> submitting this upstream to the list, but never did. Because of me not doing
> that, I had to manually (I was even too lazy to script it) rebase the
> branch. Docker then did some changes to their business, where the docker
> builds where not possible anymore.
>
> So then I figured, I'll do the same on gitlab and push it to the docker hub.
> Thus I setup a mirror on gitlab [1], with the idea to work there on it.
>
> Again, I never got around to finalize this work, mostly because the docker
> container 'just worked' for pretty much everything. After all, git is very
> stable overal.
>
> So very interestingly, last month commit 0e3b67e2aa25edb ("ci: add support
> for GitLab CI") landed, which started to trigger pipeline jobs!
>
> Sadly, this only worked for 3 builds, as that's when the minutes ran out :)
>
> So one, I would very much like to offer the registered names (cause they are
> pretty nice in name) to here, so people can use and find it.
Not to throw a wrench into this, but are you aware of the official
GitLab mirror at https://gitlab.com/git-vcs/git? I myself wasn't aware
of this mirror for a rather long time.
I also wondered whether we want to have https://gitlab.com/git/git as we
do on GitHub. I don't think anybody registered it, but it is blocked
from being registered as far as I can tell. Maybe we block the namespace
out of caution, I dunno. I can certainly check in with our SREs in case
it is something the Git project would like to own.
> Two, hopefully get Patrick Steinhardt to help out to get unlimited minutes
> and storage on the repo :)
I'm sure we can do something here, but I'd rather aim to do this for the
official mirror which currently is the one I mentioned above. If the
project is interested in running builds on GitLab then I'm happy to
coordinate.
Also Cc Ævar, who is the current owner of the mirror. Would it be
possible to add myself as a second owner to the project? This might help
setting up the CI infrastructure. But please, if anybody disagrees with
me being added as an owner here I encourage you to say so.
> Three, see what the opinion of people here is on this. I'll do the work to
> get the dockerfile (now containerfile, we're inclusive after all) merged,
> and the CI file updated to create, store (and push to docker hub) the
> generated containers.
I don't really have much of an opinion here and will leave it to others
to discuss.
Thanks!
Patrick
PS: As most other folks I'll be OOO during holidays, so I may only
answer sporadically.
> Thanks,
> Olliver
>
> [0]: https://hub.docker.com/r/gitscm/git
> [1]: https://gitlab.com/gitscm/git
>
> P.S. I'm not subscribed, so please keep me in the CC :)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Git mirror at gitlab
From: Olliver Schinagl @ 2023-12-21 11:30 UTC (permalink / raw)
To: git, ps, gitster
Hey list,
For years, I wanted (tried, but time) to run the mirror for git on
gitlab. Actually, the original idea was to run a docker container ([0]
10k+ pulls :p)
I initially set this up via docker build containers, where docker hub
would pull my mirror of the git repo. My mirror, because I added a
Dockerfile which was enough for docker to do its trick. I was planning
(time ..) on submitting this upstream to the list, but never did.
Because of me not doing that, I had to manually (I was even too lazy to
script it) rebase the branch. Docker then did some changes to their
business, where the docker builds where not possible anymore.
So then I figured, I'll do the same on gitlab and push it to the docker
hub. Thus I setup a mirror on gitlab [1], with the idea to work there on it.
Again, I never got around to finalize this work, mostly because the
docker container 'just worked' for pretty much everything. After all,
git is very stable overal.
So very interestingly, last month commit 0e3b67e2aa25edb ("ci: add
support for GitLab CI") landed, which started to trigger pipeline jobs!
Sadly, this only worked for 3 builds, as that's when the minutes ran out :)
So one, I would very much like to offer the registered names (cause they
are pretty nice in name) to here, so people can use and find it.
Two, hopefully get Patrick Steinhardt to help out to get unlimited
minutes and storage on the repo :)
Three, see what the opinion of people here is on this. I'll do the work
to get the dockerfile (now containerfile, we're inclusive after all)
merged, and the CI file updated to create, store (and push to docker
hub) the generated containers.
Thanks,
Olliver
[0]: https://hub.docker.com/r/gitscm/git
[1]: https://gitlab.com/gitscm/git
P.S. I'm not subscribed, so please keep me in the CC :)
^ permalink raw reply
* Re: [PATCH 00/24] pack-objects: multi-pack verbatim reuse
From: Jeff King @ 2023-12-21 11:13 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <ZXxy1USjjjAbBi++@nand.local>
On Fri, Dec 15, 2023 at 10:37:57AM -0500, Taylor Blau wrote:
> On Tue, Dec 12, 2023 at 03:12:38AM -0500, Jeff King wrote:
> > So my question is: how much of what you're seeing is from (1) and (2),
> > and how much is from (3)? Because there are other ways to trigger (3),
> > such as lowering the window size. For example, if you try your same
> > packing example with --window=0, how do the CPU and output size compare
> > to the results of your series? (I'd also check peak memory usage).
>
> Interesting question! Here are some preliminary numbers on my machine
> (which runs Debian unstable with a Intel Xenon W-2255 CPU @ 3.70GHz and
> 64GB of RAM).
>
> I ran the following hyperfine command on my testing repository, which
> has the Git repository broken up into ~75 packs or so:
Thanks for running these tests. The results are similar to what
expected, which is: yes, most of your CPU savings are from skipping
deltas, but not all.
Here's what I see (which I think is mostly redundant with what you've
said, but I just want to lay out my line of thinking). I'll reorder your
quoted sections a bit as I go:
> Benchmark 2: multi-pack reuse, pack.window=0
> [...]
> Time (mean ± σ): 1.075 s ± 0.005 s [User: 0.990 s, System: 0.188 s]
> Range (min … max): 1.071 s … 1.088 s 10 runs
>
> Benchmark 4: multi-pack reuse, pack.window=10
> [...]
> Time (mean ± σ): 1.028 s ± 0.002 s [User: 1.150 s, System: 0.184 s]
> Range (min … max): 1.026 s … 1.032 s 10 runs
OK, so when we're doing more full ("multi") reuse, the pack window
doesn't make a big difference either way. You didn't show the stderr
from each, but presumably most of the objects are hitting the "reuse"
path, and only a few are deltas (and that is backed up by the fact that
doing deltas only gives us a slight improvement in the output size:
> Benchmark 2: multi-pack reuse, pack.window=0
> 268.670 MB
> Benchmark 4: multi-pack reuse, pack.window=10
> 266.473 MB
Comparing the runs with less reuse:
> Benchmark 1: single-pack reuse, pack.window=0
> [...]
> Time (mean ± σ): 1.248 s ± 0.004 s [User: 1.160 s, System: 0.188 s]
> Range (min … max): 1.244 s … 1.259 s 10 runs
>
> Benchmark 3: single-pack reuse, pack.window=10
> [...]
> Time (mean ± σ): 6.281 s ± 0.024 s [User: 43.727 s, System: 0.492 s]
> Range (min … max): 6.252 s … 6.326 s 10 runs
there obviously is a huge amount of time saved by not doing deltas, but
we pay for it with a much bigger pack:
> Benchmark 1: single-pack reuse, pack.window=0
> 264.443 MB
> Benchmark 3: single-pack reuse, pack.window=10
> 194.355 MB
But of course that "much bigger" pack is about the same size as the one
we get from doing multi-pack reuse. Which is not surprising, because
both are avoiding looking for new deltas (and the packs after the
preferred one probably have mediocre deltas).
So I do actually think that disabling pack.window gives you a
similar-ish tradeoff to expanding the pack-reuse code (~6s down to ~1s,
and a 36% embiggening of the resulting pack size).
Which implies that one option is to scrap your entire series and just
set pack.window. Basically comparing multi/10 (your patches) to single/0
(a hypothetical config option), which have similar run-times and pack
sizes.
But that's not quite the whole story. There is still a CPU improvement
in your series (1.2s vs 1.0s, a 20% speedup). And as I'd expect, a
memory improvement from avoiding the extra book-keeping (almost 10%):
> Benchmark 1: single-pack reuse, pack.window=0
> 354.224 MB (max RSS)
> Benchmark 4: multi-pack reuse, pack.window=10
> 328.786 MB (max RSS)
So while it's a lot less code to just set the window size, I do think
those improvements are worth it. And really, it's the same tradeoff we
make for the single-pack case (i.e., one could argue that we
could/should rip out the verbatim-reuse code entirely in favor of just
tweaking the window size).
> It's pretty close between multi-pack reuse with a window size of 0 and
> a window size of 10. If you want to optimize for pack size, you could
> trade a ~4% reduction in pack size for a ~1% increase in peak memory
> usage.
I think if you want to optimize for pack size, you should consider
repacking all-into-one to get better on-disk deltas. ;) I know that's
easier said than done when the I/O costs are significant. I do wonder if
storing thin packs on disk would let us more cheaply reach a state that
could serve optimal-ish packs without spending CPU computing bespoke
deltas for each client. But that's a much larger topic.
-Peff
^ permalink raw reply
* Re: [PATCH 0/8] reftable: small set of fixes
From: Han-Wen Nienhuys @ 2023-12-21 11:08 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <cover.1700549493.git.ps@pks.im>
On Tue, Nov 21, 2023 at 8:07 AM Patrick Steinhardt <ps@pks.im> wrote:
> It's a bit unfortunate that we don't yet have good test coverage as
> there are no end-to-end tests, and most of the changes I did are not
> easily testable in unit tests. So until the reftable backend gets
..
> reftable/stack: perform auto-compaction with transactional interface
you can test autocompaction more precisely using the stats
in reftable_compaction_stats. See
https://github.com/hanwen/reftable/blob/master/stack_test.go#L126
for an example.
--
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
^ permalink raw reply
* Re: [PATCH v2 06/11] reftable/stack: reuse buffers when reloading stack
From: Han-Wen Nienhuys @ 2023-12-21 10:58 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Taylor Blau, git, Jonathan Nieder
In-Reply-To: <ZXbRmwj1vZ2dA3s9@tanuki>
On Mon, Dec 11, 2023 at 10:08 AM Patrick Steinhardt <ps@pks.im> wrote:
> > This initially caught me by surprise, but on closer inspection I agree
> > that this is OK, since stack_filename() calls strbuf_reset() before
> > adjusting the buffer contents.
> >
> > (As a side-note, I do find the side-effect of stack_filename() to be a
> > little surprising, but that's not the fault of this series and not worth
> > changing here.)
>
> Agreed, I also found this to be a bit confusing at first. I'll amend the
> commit message with "Note that we do not have to manually reset the
> buffer because `stack_filename()` does this for us already." to help
> future readers.
In C++ it is expected that assignment operators clear the destination
before executing the assignment, so it depends on your expectations.
If this is confusing, maybe another name is in order?
--
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
^ permalink raw reply
* Re: [PATCH v2 00/26] pack-objects: multi-pack verbatim reuse
From: Jeff King @ 2023-12-21 10:51 UTC (permalink / raw)
To: Taylor Blau; +Cc: Junio C Hamano, git, Patrick Steinhardt
In-Reply-To: <ZXurD1NTZ4TAs7WZ@nand.local>
On Thu, Dec 14, 2023 at 08:25:35PM -0500, Taylor Blau wrote:
> On Thu, Dec 14, 2023 at 04:40:40PM -0800, Junio C Hamano wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> >
> > > I haven't looked into the details yet, but it seems that
> > > t5309-pack-delta-cycles.sh fails under
> > >
> > > $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make -j16 test
> >
> > Hmph, this seems to be elusive. I tried it again and then it did
> > not fail this time.
>
> Indeed, but I was able to reproduce the failure both on my branch and on
> 'master' under --stress, yielding the following failure in t5309.6:
OK, so it's nothing new, and we can ignore it for your series (I haven't
seen it in the wild yet, but it is something we may need to deal with in
the long run if it keeps popping up).
> + git index-pack --fix-thin --stdin
> fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)
>
> =================================================================
> ==3904583==ERROR: LeakSanitizer: detected memory leaks
>
> Direct leak of 32 byte(s) in 1 object(s) allocated from:
> #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
> #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
> #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
> #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
> #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
> #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
> #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
> #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
> SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
> Aborted
>
> The duplicate base thing is a red-herring, and is an expected result of
> the test. But the leak is definitely not, and I'm not sure what's going
> on here since the frames listed above are in the LSan runtime, not Git.
I suspect this is a race in LSan caused by a thread calling exit() while
other threads are spawning. Here's my theory.
When a thread is spawned, LSan needs to know where its stack is (so it
can look for points to reachable memory). It calls pthread_getattr_np(),
which gets an attributes object that must be cleaned up with
pthread_attr_destroy(). Presumably it does this shortly after. But
there's a race window where that attr object is allocated and we haven't
yet set up the new thread's info. If another thread calls exit() then,
LSan will run but its book-keeping will be in an inconsistent state.
One way to work around that would be to make the creation of the threads
atomic. That is, create each in a suspended state, and only let them run
once they are all created. There's no option in pthreads to do this, but
we can simulate it by having them block on a mutex before starting. And
indeed, we already have such a lock: the work_lock() that they all use
to get data to process.
After applying this patch:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dda94a9f46..0e94819216 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1257,13 +1257,15 @@ static void resolve_deltas(void)
base_cache_limit = delta_base_cache_limit * nr_threads;
if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
init_thread();
+ work_lock();
for (i = 0; i < nr_threads; i++) {
int ret = pthread_create(&thread_data[i].thread, NULL,
threaded_second_pass, thread_data + i);
if (ret)
die(_("unable to create thread: %s"),
strerror(ret));
}
+ work_unlock();
for (i = 0; i < nr_threads; i++)
pthread_join(thread_data[i].thread, NULL);
cleanup_thread();
I ran t5309 with "--stress --run=6" for about 5 minutes with no failures
(whereas without the patch, I usually see a failure in 10 seconds or
so).
So it's a pretty easy fix, though I don't love it in general. Every
place that spawns multiple threads that can die() would need the same
treatment. And this isn't a "real" leak in any reasonable sense; it only
happens because we're exiting the program directly, at which point all
of the memory is returned to the OS anyway. So I hate changing
production code to satisfy a leak-checking false positive.
OTOH, dealing with false positives is annoying for humans, and the
run-time cost should be negligible. We can work around this one, and
avoid making the same change in other spots unless somebody sees a racy
failure in practice.
-Peff
^ permalink raw reply related
* Re: [PATCH v2 08/11] reftable/stack: fix use of unseeded randomness
From: Han-Wen Nienhuys @ 2023-12-21 10:49 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <bab4fb93df8d1a620eefeef99a49ea52c98dfc6e.1702047081.git.ps@pks.im>
On Fri, Dec 8, 2023 at 3:53 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> When writing a new reftable stack, Git will first create the stack with
> a random suffix so that concurrent updates will not try to write to the
I had noticed this already. Thanks for fixing!
--
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
^ permalink raw reply
* Re: [PATCH 7/8] reftable/merged: reuse buffer to compute record keys
From: Han-Wen Nienhuys @ 2023-12-21 10:48 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Jonathan Nieder
In-Reply-To: <23c060d1e21573581ca6c5db50ca756b61078e3e.1700549493.git.ps@pks.im>
On Tue, Nov 21, 2023 at 8:07 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> When iterating over entries in the merged iterator's queue, we compute
> the key of each of the entries and write it into a buffer. We do not
> reuse the buffer though and thus re-allocate it on every iteration,
> which is wasteful given that we never transfer ownership of the
> allocated bytes outside of the loop.
>
From a brief read, change looks good.
In the C code, each key has to pass through the pqueue which is
(assuming auto-compaction) has log2(#tables) entries. The JGit code
assumes that the base table will be very large and the rest small.
This means that most keys come from the base table, and has some
special casing so those keys don't have to pass through the pqueue. If
you worry about efficiency, this might be something to look into.
--
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
^ permalink raw reply
* git bisect stuck - --force flag required for checkout
From: Devste Devste @ 2023-12-21 10:47 UTC (permalink / raw)
To: git
Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.
What did you do before the bug happened? (Steps to reproduce your issue)
add file Foo.txt to .git and commit
add some commits with any changes to other files, as this is needed
for reproduction
run: git config core.ignorecase false
rename Foo.txt to foo.txt and commit
add some commits with any changes to other files, as this is needed
for reproduction
run: git bisect start && git bisect bad
eventually, when running "git bisect good" (or bad) you will get an error:
>error: The following untracked working tree files would be overwritten by checkout:
>Foo.php
Anything else you want to add:
git bisect good/bad needs to have support for a "--force" flag, which
is passed to the git checkout it runs internally
At the moment git bisect cannot be used on Windows, as there is no way
to continue the bisect from here.
Changing the "git config core.ignorecase true" temporarily is not an
option, as this will introduce a variety of other bugs,
which, on Windows, eventually will require you to completely delete
and reclone the repo, as Windows file paths are case-insensitive
[System Info]
git version:
git version 2.39.1.windows.1
cpu: i686
built from commit: b03dafd9c26b06c92d509a07ab01b01e6d0d85ee
sizeof-long: 4
sizeof-size_t: 4
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Windows 10.0
compiler info: gnuc: 12.2
libc info: no libc information available
[Enabled Hooks]
^ permalink raw reply
* Re: [PATCH 5/8] reftable/stack: perform auto-compaction with transactional interface
From: Patrick Steinhardt @ 2023-12-21 10:45 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: git, Jonathan Nieder, psteinhardt
In-Reply-To: <CAOw_e7Yfdt_Wqm-9XDJknaN-iH=haP0R4K-S4c_E3EFDzvG5aA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 930 bytes --]
On Thu, Dec 21, 2023 at 11:29:52AM +0100, Han-Wen Nienhuys wrote:
> On Tue, Nov 21, 2023 at 8:07 AM Patrick Steinhardt <ps@pks.im> wrote:
> > Whenever updating references or reflog entries in the reftable stack, we
> > need to add a new table to the stack, thus growing the stack's length by
> ..
>
> bug is correctly identified, but can't the fix remove the
>
> if (!st->disable_auto_compact)
> return reftable_stack_auto_compact(st);
>
> fragment from reftable_stack_add? reftable_addition_commit eventually
> calls reftable_addition_commit.
"calls reftable_stack_add" you probably mean here. But yes, you're
right. As this patch series has already been merged to `next` I can roll
this cleanup into my second patch series that addresses issues with the
reftable library [1]. Does that work for you?
[1]: http://public-inbox.org/git/xmqqr0jgsn9g.fsf@gitster.g/T/#t
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox