Git development

Git development
 help / color / mirror / Atom feed

* Re: "git overlay" - command for overlaying branches
From: Oliver Bandel @ 2023-12-01  0:11 UTC (permalink / raw)
  To: Michal Suchánek; +Cc: git
In-Reply-To: <20231125105046.GB9696@kitsune.suse.cz>

What I basically want to do is:

- Working on more than one branch at the same time.
- Having all checked-out branches together in one working dir.
- When adding/committing, all checked-out files will be added/committed
  to those branches, from where they were checked out.

This way I could edit multiple branches simultaneously independently without the need of merging.
Merging can be done later on, if needed.

Possible alternatives?

For the multiple checkouts I could do a workaround and use a local bare-repo
and check out each branch of interest into a seperate directory.
I could then edit the files from the different dirs.

But for the overlaying thing, to get the files into one dir
(with the automatic hiding of equally named files from branches that
were overlayed before) it would need something like a FUSE-based
tool, so that all the branches could be blended into one working dir.

A "git overlay" would make this easier.
No extra bare-repo needed, no additional tool for blending the dirs
needed.

Ciao,
  Oliver

^ permalink raw reply

* (no subject)
From: 5598162950 @ 2023-11-30 21:46 UTC (permalink / raw)
  To: 5595721311, 18884493239, git

[-- Attachment #1: text_0.txt --]
[-- Type: text/plain, Size: 66 bytes --]

Someone’s changing my stuff bad mom christal or Victoria or you!

^ permalink raw reply

* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present
From: Zach FettersMoore @ 2023-11-30 21:01 UTC (permalink / raw)
  To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git
In-Reply-To: <CAP8UFD1rd+q-dC_w2VgZ_jC++LDeF6gu5wDcbQzSuhU1ksfBpA@mail.gmail.com>

>> To see this in practice you can use the open source GitHub repo
>> 'apollo-ios-dev' and do the following in order:
>>
>> -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'

> It looks like there is a spurious A after 'apollo-ios' in the line above.

Thanks for catching that, definitely a typo on my part.

>> directories
>> -Create a commit containing these changes
>> -Do a split on apollo-ios-codegen
>> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

> I might be doing something stupid or wrong, but I get the following:
>
> $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> fatal: could not rev-parse split hash
> cc70a7d49e84696f0df210710445784c504ed748 from commit
> 360f068ea0d57f250621ab7dbe205313f52a0e98
> hint: hash might be a tag, try fetching it from the subtree repository:
> hint: git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748

Updated this to include doing a fetch to ensure all remote repo
info is available locally.

>> -Do a split on apollo-ios
>> - git subtree split --prefix=apollo-ios --squash --rejoin

> Same issue:
>
> $ git subtree split --prefix=apollo-ios --squash --rejoin
> fatal: could not rev-parse split hash
> b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit
> b48030c3eb6e2faf4bff981c5c63ca72aceecdfa
> hint: hash might be a tag, try fetching it from the subtree repository:
> hint: git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111
>
> I didn't try to get farther than this, as it seems that some
> instructions might be missing.

Same as above, added extra instruction to do a fetch first.

Also added a little extra info that the issue may present after the
first split in the instructions depending on the current state of the
repo being used. Also added a way to do the same steps with the changes
applied to see that it resolves the issue.

>> So this commit makes a change to the processing of commits for the
>> split command in order to ignore non-mainline commits from other
>> subtrees such as apollo-ios in the above breakdown by adding a new
>> function 'should_ignore_subtree_commit' which is called during
>> 'process_split_commit'. This allows the split/rejoin processing to
>> still function as expected but removes all of the unnecessary
>> processing that takes place currently which greatly inflates the
>> processing time. In the above example, previously the final split
>> would take ~10-12 minutes, while after this fix it takes seconds.

> Nice!
>
> Except for the above issues in the commit message, the rest of the
> patch looks good to me, thanks!

Great! Thanks for the review and guidance!

^ permalink raw reply

* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present
From: Christian Couder @ 2023-11-30 20:33 UTC (permalink / raw)
  To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore
In-Reply-To: <pull.1587.v5.git.1701206267300.gitgitgadget@gmail.com>

On Tue, Nov 28, 2023 at 10:17 PM Zach FettersMoore via GitGitGadget
<gitgitgadget@gmail.com> wrote:

> To see this in practice you can use the open source GitHub repo
> 'apollo-ios-dev' and do the following in order:
>
> -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'

It looks like there is a spurious A after 'apollo-ios' in the line above.

>  directories
> -Create a commit containing these changes
> -Do a split on apollo-ios-codegen
>    - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

I might be doing something stupid or wrong, but I get the following:

$ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
fatal: could not rev-parse split hash
cc70a7d49e84696f0df210710445784c504ed748 from commit
360f068ea0d57f250621ab7dbe205313f52a0e98
hint: hash might be a tag, try fetching it from the subtree repository:
hint:    git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748

> -Do a split on apollo-ios
>    - git subtree split --prefix=apollo-ios --squash --rejoin

Same issue:

$ git subtree split --prefix=apollo-ios --squash --rejoin
fatal: could not rev-parse split hash
b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit
b48030c3eb6e2faf4bff981c5c63ca72aceecdfa
hint: hash might be a tag, try fetching it from the subtree repository:
hint:    git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111

I didn't try to get farther than this, as it seems that some
instructions might be missing.

[...]

> So this commit makes a change to the processing of commits for the
> split command in order to ignore non-mainline commits from other
> subtrees such as apollo-ios in the above breakdown by adding a new
> function 'should_ignore_subtree_commit' which is called during
> 'process_split_commit'. This allows the split/rejoin processing to
> still function as expected but removes all of the unnecessary
> processing that takes place currently which greatly inflates the
> processing time. In the above example, previously the final split
> would take ~10-12 minutes, while after this fix it takes seconds.

Nice!

Except for the above issues in the commit message, the rest of the
patch looks good to me, thanks!

^ permalink raw reply

* [PATCH 2/2] completion: stop checking for reference existence via `test -f`
From: Stan Hu @ 2023-11-30 20:24 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu
In-Reply-To: <20231130202404.89791-2-stanhu@gmail.com>

In contrib/completion/git-completion.bash, there are a bunch of
instances where we read special refs like HEAD, MERGE_HEAD,
REVERT_HEAD, and others via the filesystem. However, the upcoming
reftable refs backend won't use '.git/HEAD' at all but instead will
write an invalid refname as placeholder for backwards compatibility,
which will break the git-completion script.

Update the '__git_ref_exists' function to:

1. Recognize the placeholder '.git/HEAD' written by the reftable
   backend (its content is specified in the reftable specs).
2. If reftable is in use, use 'git rev-parse' to determine whether the
    given ref exists.
3. Otherwise, continue to use 'test -f' to check for the ref's filename.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Stan Hu <stanhu@gmail.com>
---
 contrib/completion/git-completion.bash | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 9fbdc13f9a..f5b630ba99 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -122,12 +122,35 @@ __git ()
 		${__git_dir:+--git-dir="$__git_dir"} "$@" 2>/dev/null
 }
 
+# Helper function to read the first line of a file into a variable.
+# __git_eread requires 2 arguments, the file path and the name of the
+# variable, in that order.
+#
+# This is taken from git-prompt.sh.
+__git_eread ()
+{
+	test -r "$1" && IFS=$'\r\n' read -r "$2" <"$1"
+}
+
 # Runs git in $__git_repo_path to determine whether a ref exists.
 # 1: The ref to search
 __git_ref_exists ()
 {
 	local ref=$1
 
+	# If the reftable is in use, we have to shell out to 'git rev-parse'
+	# to determine whether the ref exists instead of looking directly in
+	# the filesystem to determine whether the ref exists. Otherwise, use
+	# Bash builtins since executing Git commands are expensive on some
+	# platforms.
+	if __git_eread "$__git_repo_path/HEAD" head; then
+		b="${head#ref: }"
+		if [ "$b" == "refs/heads/.invalid" ]; then
+			__git -C "$__git_repo_path" rev-parse --verify --quiet "$ref" 2>/dev/null
+			return $?
+		fi
+	fi
+
 	[ -f "$__git_repo_path/$ref" ]
 }
 
-- 
2.42.0


^ permalink raw reply related

* [PATCH 1/2] completion: refactor existence checks for special refs
From: Stan Hu @ 2023-11-30 20:24 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu
In-Reply-To: <20231130202404.89791-1-stanhu@gmail.com>

In preparation for the reftable backend, this commit introduces a
'__git_ref_exists' function that continues to use 'test -f' to
determine whether a given ref exists in the local filesystem.

Each caller of '__git_ref_exists' must call '__git_find_repo_path`
first. '_git_restore' already used 'git rev-parse HEAD', but to use
'__git_ref_exists' insert a '__git_find_repo_path' at the start.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Stan Hu <stanhu@gmail.com>
---
 contrib/completion/git-completion.bash | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 13a39ebd2e..9fbdc13f9a 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -122,6 +122,15 @@ __git ()
 		${__git_dir:+--git-dir="$__git_dir"} "$@" 2>/dev/null
 }
 
+# Runs git in $__git_repo_path to determine whether a ref exists.
+# 1: The ref to search
+__git_ref_exists ()
+{
+	local ref=$1
+
+	[ -f "$__git_repo_path/$ref" ]
+}
+
 # Removes backslash escaping, single quotes and double quotes from a word,
 # stores the result in the variable $dequoted_word.
 # 1: The word to dequote.
@@ -1625,7 +1634,7 @@ __git_cherry_pick_inprogress_options=$__git_sequencer_inprogress_options
 _git_cherry_pick ()
 {
 	__git_find_repo_path
-	if [ -f "$__git_repo_path"/CHERRY_PICK_HEAD ]; then
+	if __git_ref_exists CHERRY_PICK_HEAD; then
 		__gitcomp "$__git_cherry_pick_inprogress_options"
 		return
 	fi
@@ -2067,7 +2076,7 @@ _git_log ()
 	__git_find_repo_path
 
 	local merge=""
-	if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
+	if __git_ref_exists MERGE_HEAD; then
 		merge="--merge"
 	fi
 	case "$prev,$cur" in
@@ -2934,6 +2943,7 @@ _git_reset ()
 
 _git_restore ()
 {
+	__git_find_repo_path
 	case "$prev" in
 	-s)
 		__git_complete_refs
@@ -2952,7 +2962,7 @@ _git_restore ()
 		__gitcomp_builtin restore
 		;;
 	*)
-		if __git rev-parse --verify --quiet HEAD >/dev/null; then
+		if __git_ref_exists HEAD; then
 			__git_complete_index_file "--modified"
 		fi
 	esac
@@ -2963,7 +2973,7 @@ __git_revert_inprogress_options=$__git_sequencer_inprogress_options
 _git_revert ()
 {
 	__git_find_repo_path
-	if [ -f "$__git_repo_path"/REVERT_HEAD ]; then
+	if __git_ref_exists REVERT_HEAD; then
 		__gitcomp "$__git_revert_inprogress_options"
 		return
 	fi
@@ -3592,7 +3602,7 @@ __gitk_main ()
 	__git_find_repo_path
 
 	local merge=""
-	if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
+	if __git_ref_exists MERGE_HEAD; then
 		merge="--merge"
 	fi
 	case "$cur" in
-- 
2.42.0


^ permalink raw reply related

* [PATCH 0/2] completion: refactor and support reftables backend
From: Stan Hu @ 2023-11-30 20:24 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu

Hi,

This patch series refactors and updates git-completion.bash to become
compatible with the upcoming reftables backend.

Stan Hu (2):
  completion: refactor existence checks for special refs
  completion: stop checking for reference existence via `test -f`

 contrib/completion/git-completion.bash | 43 +++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 5 deletions(-)

-- 
2.42.0


^ permalink raw reply

* Re: Consider dropping the decimal places for KiB/s 52.00 KiB/s
From: Jonny Grant @ 2023-11-30 20:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZWjdatp3SRb4mN6G@nand.local>



On 30/11/2023 19:07, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 06:11:57PM +0000, Jonny Grant wrote:
>> Hello
>>
>> May I suggest taking off the .00 KiB/s suffix, has that been
>> considered? As the decimal places don't appear to change, they're
>> stuck on .00.
> 
> I wonder if you have a throttled connection that is locked to 52KiB/s
> exactly.

You're right - I changed to https and it went to 6 MiB/s. Seems throttled on git://
I should have checked that first.

>  The progress code that generates the throughput is in
> progress.c::display_throughput(), which computes the rate. It's computed
> in bytes/misec, and then passed to throughput_string() (really,
> `strbuf_humanise_rate()`), which formats it appropriately.
> 
> If you're in the KiB range, it will print the decimal component, which
> is:
> 
>     ((bytes & ((1<<10)-1)) * 100) >> 10
> 
>> $ git clone git://gcc.gnu.org/git/gcc.git git_1
>> Cloning into 'git_1'...
>> remote: Enumerating objects: 2949348, done.
>> remote: Counting objects: 100% (209238/209238), done.
>> remote: Compressing objects: 100% (14579/14579), done.
>> Receiving objects:   7% (210878/2949348), 76.33 MiB | 52.00 KiB/s
> 
> On my machine:
> 
>     $ git.compile clone git://gcc.gnu.org/git/gcc.git
>     [...]
>     Receiving objects:  11% (342176/2949348), 108.09 MiB | 24.01 MiB/s
> 
> I suppose we could consider dropping the decimal component if it's a
> round number, but I think that it may produce awkward flickering if the
> rate oscillates between a round number and a non-round number.

Now I can see the advantage of it as it is. wget is similar.
Kind regards, Jonny



^ permalink raw reply

* Re: [PATCH 00/24] pack-objects: multi-pack verbatim reuse
From: Taylor Blau @ 2023-11-30 19:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhha2h2zzZWCXrw@tanuki>

On Thu, Nov 30, 2023 at 11:18:19AM +0100, Patrick Steinhardt wrote:
> > Performing verbatim pack reuse naturally trades off between CPU time and
> > the resulting pack size. In the above example, the single-pack reuse
> > case produces a clone size of ~194 MB on my machine, while the
> > multi-pack reuse case produces a clone size closer to ~266 MB, which is
> > a ~37% increase in clone size.
>
> Quite exciting, and a tradeoff that may be worth it for Git hosters. I
> expect that this is going to be an extreme example of the benefits
> provided by your patch series -- do you by any chance also have "real"
> numbers that make it possible to quantify the effect a bit better?
>
> No worry if you don't, I'm just curious.

I don't have a great sense, no. I haven't run these patches yet in
production, although would like to do so soon for internal repositories
to get a better sense here.

There are some performance tests at the end which try and give you a
sense of at least the relative speed-up depending on how many disjoint
packs you have (IIRC, we test for 1, 10, and 100 disjoint packs).

> > I think there is still some opportunity to close this gap, since the
> > "packing" strategy here is extremely naive. In a production setting, I'm
> > sure that there are more well thought out repacking strategies that
> > would produce more similar clone sizes.
> >
> > I considered breaking this series up into smaller chunks, but was
> > unsatisfied with the result. Since this series is rather large, if you
> > have alternate suggestions on better ways to structure this, please let
> > me know.
>
> The series is indeed very involved to review. I only made it up to patch
> 8/24 and already spent quite some time on it. So I'd certainly welcome
> it if this was split up into smaller parts, but don't have a suggestion
> as to how this should be done (also because I didn't yet read the other
> 16 patches).

I suppose that one way to break it up might be:

    pack-objects: free packing_data in more places
    pack-bitmap-write: deep-clear the `bb_commit` slab
    pack-bitmap: plug leak in find_objects()

    midx: factor out `fill_pack_info()`
    midx: implement `DISP` chunk
    midx: implement `midx_locate_pack()`
    midx: implement `--retain-disjoint` mode

    pack-objects: implement `--ignore-disjoint` mode
    repack: implement `--extend-disjoint` mode

    pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
    pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature
    pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`
    pack-objects: parameterize pack-reuse routines over a single pack
    pack-objects: keep track of `pack_start` for each reuse pack
    pack-objects: pass `bitmapped_pack`'s to pack-reuse functions
    pack-objects: prepare `write_reused_pack()` for multi-pack reuse
    pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse
    pack-objects: include number of packs reused in output
    pack-bitmap: prepare to mark objects from multiple packs for reuse

    pack-objects: add tracing for various packfile metrics
    t/test-lib-functions.sh: implement `test_trace2_data` helper
    pack-objects: allow setting `pack.allowPackReuse` to "single"
    pack-bitmap: reuse objects from all disjoint packs
    t/perf: add performance tests for multi-pack reuse

Then you'd have five patch series, where each series does roughly the
following:

  1. Preparatory clean-up.
  2. Implementing the DISP chunk, as well as --retain-disjoint, without
     a way to generate such packs.
  3. Implement a way to generate such packs, but without actually being
     able to reuse more than one of them.
  4. Implement multi-pack reuse, but without actually reusing any packs.
  5. Enable multi-pack reuse (and implement the required scaffolding to
     do so), and test it.

That's the most sensible split that I could come up with, at least. But
I still find it relatively unsatisfying for a couple of reasons:

  - With the exception of the last group of patches, none of the earlier
    series enable any new, useful behavior on their own. IOW, if we just
    merged the first three series and then forgot about this topic, we
    wouldn't have done anything useful ;-).

  - The fourth series (which actually implements multi-pack reuse) still
    remains the most complicated, and would likely be the most difficult
    to review. So I think you'd still have one difficult series to
    review, plus four other series which are slightly less difficult to
    review ;-).

It's entirely possible that I'm just too close to these patches to see a
better split, so if you (or others) have any suggestions on how to break
this up, please don't hesitate to share them.

> I'll review the remaining patches at a later point in time.

Thanks, I'll look forward to more of your review as usual :-).

Thanks,
Taylor

^ permalink raw reply

* Minor UX annoyance w/`git add --patch untracked/file`
From: Vito Caputo @ 2023-11-30 19:26 UTC (permalink / raw)
  To: git

Hello list,

Couldn't the following two steps be done automagically by --patch:

```
git add -N path/to/untracked/file/wishing/to/partially/add
git add --patch path/to/untracked/file/wishing/to/partially/add
```

when one simply does:

`git add --patch path/to/untracked/file/wishing/to/partially/add`

?

Cheers,
Vito Caputo

^ permalink raw reply

* Re: [PATCH 08/24] pack-objects: implement `--ignore-disjoint` mode
From: Taylor Blau @ 2023-11-30 19:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhkdnVZ9w7tDBv@tanuki>

On Thu, Nov 30, 2023 at 11:18:57AM +0100, Patrick Steinhardt wrote:
> > Instead, teach `pack-objects` a special `--ignore-disjoint` which is the
> > moral equivalent of marking the set of disjoint packs as kept, and
> > ignoring their contents, even if it would have otherwise been packed. In
> > fact, this similarity extends down to the implementation, where each
> > disjoint pack is first loaded, then has its `pack_keep_in_core` bit set.
> >
> > With this in place, we can use the kept-pack cache from 20b031fede
> > (packfile: add kept-pack cache for find_kept_pack_entry(), 2021-02-22),
> > which looks up objects first in a cache containing just the set of kept
> > (in this case, disjoint) packs. Assuming that the set of disjoint packs
> > is a relatively small portion of the entire repository (which should be
> > a safe assumption to make), each object lookup will be very inexpensive.
>
> This cought me by surprise a bit. I'd have expected that in the end,
> most of the packfiles in a repository would be disjoint. Using for
> example geometric repacks, my expectation was that all of the packs that
> get written via geometric repacking would eventually become disjoint
> whereas new packs added to the repository would initially not be.

Which part are you referring to here? If you're referring to the part
where I say that the set of disjoint packs is relatively small in
proposition to the rest of the packs, I think I know where the confusion
is.

I'm not saying that the set of disjoint packs is small in comparison to
the rest of the repository by object count, but rather by count of packs
overall. You're right that packs from pushes will not be guaranteed to
be disjoint upon entering the repository, but will become disjoint when
geometrically repacked (assuming that the caller uses --ignore-disjoint
when repacking).

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 07/24] midx: implement `--retain-disjoint` mode
From: Taylor Blau @ 2023-11-30 19:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhi15VpeCRflDB@tanuki>

On Thu, Nov 30, 2023 at 11:18:51AM +0100, Patrick Steinhardt wrote:
> > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> > index d130e65b28..ac0c7b124b 100644
> > --- a/Documentation/git-multi-pack-index.txt
> > +++ b/Documentation/git-multi-pack-index.txt
> > @@ -54,6 +54,14 @@ write::
> >  		"disjoint". See the "`DISP` chunk and disjoint packs"
> >  		section in linkgit:gitformat-pack[5] for more.
> >
> > +	--retain-disjoint::
> > +		When writing a multi-pack index with a reachability
> > +		bitmap, keep any packs marked as disjoint in the
> > +		existing MIDX (if any) as such in the new MIDX. Existing
> > +		disjoint packs which are removed (e.g., not listed via
> > +		`--stdin-packs`) are ignored. This option works in
> > +		addition to the '+' marker for `--stdin-packs`.
>
> I'm trying to understand when you're expected to pass this flag and when
> you're expected not to pass it. This documentation could also help in
> the documentation here so that the user can make a more informed
> decision.

I think there are multiple reasons that you may or may not want to pass
that flag. Certainly if you're not using disjoint packs (and instead
only care about single-pack verbatim reuse over the MIDX's preferred
packfile), then you don't need to pass it.

But if you are using disjoint packs, you may want to pass it if you are
adding packs to the MIDX which are disjoint, _and_ you want to hold onto
the existing set of disjoint packs.

But if you want to change the set of disjoint packs entirely, you would
want to omit this flag (unless you knew a-priori that you were going to
drop all of the currently marked disjoint packs from the new MIDX you
are writing, e.g. with --stdin-packs).

If you think it would be useful, I could try and distill some of this
down, but I think that there is likely too much detail here for it to be
useful in user-facing documentation.

Thanks,
Taylor

^ permalink raw reply

* Re: Consider dropping the decimal places for KiB/s 52.00 KiB/s
From: Dragan Simic @ 2023-11-30 19:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonny Grant, git
In-Reply-To: <ZWjdatp3SRb4mN6G@nand.local>

On 2023-11-30 20:07, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 06:11:57PM +0000, Jonny Grant wrote:
>> $ git clone git://gcc.gnu.org/git/gcc.git git_1
>> Cloning into 'git_1'...
>> remote: Enumerating objects: 2949348, done.
>> remote: Counting objects: 100% (209238/209238), done.
>> remote: Compressing objects: 100% (14579/14579), done.
>> Receiving objects:   7% (210878/2949348), 76.33 MiB | 52.00 KiB/s
> 
> On my machine:
> 
>     $ git.compile clone git://gcc.gnu.org/git/gcc.git
>     [...]
>     Receiving objects:  11% (342176/2949348), 108.09 MiB | 24.01 MiB/s
> 
> I suppose we could consider dropping the decimal component if it's a
> round number, but I think that it may produce awkward flickering if the
> rate oscillates between a round number and a non-round number.

You're right, the resulting flickering would look really annoying.  In 
fact, I already modified the reported download speed in another project 
to avoid pretty much the same kind of flickering, and it looked much 
better without it.

^ permalink raw reply

* Re: [PATCH 05/24] midx: implement `DISP` chunk
From: Taylor Blau @ 2023-11-30 19:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhhaXiF_wC3er7@tanuki>

On Thu, Nov 30, 2023 at 11:18:45AM +0100, Patrick Steinhardt wrote:
> > diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
> > index 9fcb29a9c8..658682ddd5 100644
> > --- a/Documentation/gitformat-pack.txt
> > +++ b/Documentation/gitformat-pack.txt
> > @@ -396,6 +396,22 @@ CHUNK DATA:
> >  	    is padded at the end with between 0 and 3 NUL bytes to make the
> >  	    chunk size a multiple of 4 bytes.
> >
> > +	Disjoint Packfiles (ID: {'D', 'I', 'S', 'P'})
> > +	    Stores a table of three 4-byte unsigned integers in network order.
> > +	    Each table entry corresponds to a single pack (in the order that
> > +	    they appear above in the `PNAM` chunk). The values for each table
> > +	    entry are as follows:
> > +	    - The first bit position (in psuedo-pack order, see below) to
>
> s/psuedo/pseudo/

Good catch, thanks. Not sure how that escaped my spell-checker...

> > +=== `DISP` chunk and disjoint packs
> > +
> > +The Disjoint Packfiles (`DISP`) chunk encodes additional information
> > +about the objects in the multi-pack index's reachability bitmap. Recall
> > +that objects from the MIDX are arranged in "pseudo-pack" order (see:
>
> The colon feels a bit out-of-place here, so: s/see:/see/

Thanks, I'll fix that up.

> > +above) for reachability bitmaps.
> > +
> > +From the example above, suppose we have packs "a", "b", and "c", with
> > +10, 15, and 20 objects, respectively. In pseudo-pack order, those would
> > +be arranged as follows:
> > +
> > +    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
> > +
> > +When working with single-pack bitmaps (or, equivalently, multi-pack
> > +reachability bitmaps without any packs marked as disjoint),
> > +linkgit:git-pack-objects[1] performs ``verbatim'' reuse, attempting to
> > +reuse chunks of the existing packfile instead of adding objects to the
> > +packing list.
>
> I'm not sure I full understand this paragraph. In the context of a
> single pack bitmap it's clear enough. But I stumbled over the MIDX case,
> because here we potentially have multiple packfiles, so it's not exactly
> clear to me what you refer to with "the existing packfile" in that case.
> I'd think that we perform verbatim reuse of the preferred packfile,
> right? If so, we might want to make that a bit more explicit.

Yep, sorry, I can see how that would be confusing. Since we're talking
about the existing behavior at this point in the series (before
multi-pack reuse is implemented), I changed this to:

  "reuse chunks of the bitmapped or preferred packfile [...]"

Thanks for carefully reading and spotting my errors ;-).

> > +object. This introduces an additional constraint over the set of packs
> > +we may want to reuse. The most straightforward approach is to mandate
> > +that the set of packs is disjoint with respect to the set of objects
> > +contained in each pack. In other words, for each object `o` in the union
> > +of all objects stored by the disjoint set of packs, `o` is contained in
> > +exactly one pack from the disjoint set.
>
> Is this a property that usually holds for our normal housekeeping, or
> does it require careful managing by the user/admin? How about geometric
> repacking?

At this point in the series, it would require careful managing to ensure
that this is the case. In practice MIDX'd packs generated with a
geometric repack are mostly disjoint, but definitely not guaranteed to
be.

Further down in this series we'll introduce new options to generate
packs which are guaranteed to be disjoint with respect to the
currently-marked set of packs in the DISP chunk.

> > @@ -764,14 +807,22 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
> >  		 * Take only the first duplicate.
> >  		 */
> >  		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
> > -			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
> > -						&fanout.entries[cur_object].oid))
> > -				continue;
> > +			struct pack_midx_entry *ours = &fanout.entries[cur_object];
> > +			if (cur_object) {
> > +				struct pack_midx_entry *prev = &fanout.entries[cur_object - 1];
> > +				if (oideq(&prev->oid, &ours->oid)) {
> > +					if (prev->disjoint && ours->disjoint)
> > +						die(_("duplicate object '%s' among disjoint packs '%s', '%s'"),
> > +						    oid_to_hex(&prev->oid),
> > +						    info[prev->pack_int_id].pack_name,
> > +						    info[ours->pack_int_id].pack_name);
>
> Shouldn't we die if `prev->disjoint || ours->disjoint` instead of `&&`?
> Even if one of the packs isn't marked as disjoint, it's still wrong if
> the other one is and one of its objects exists in multiple packs.
>
> Or am I misunderstanding, and we only guarantee the disjoint property
> across packfiles that are actually marked as such?

Right, we only guarantee disjointed-ness among the set of packs that are
marked disjoint. It's fine for the same object to appear in a disjoint
and non-disjoint pack, and for both of those packs to end up in the
MIDX. But that is only because we'll use the disjoint copy in our
bitmap.

If there were two packs that are marked as supposedly disjoint, but
contain at least one duplicate of an object, then we will reject those
packs as non-disjoint.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 04/24] midx: factor out `fill_pack_info()`
From: Taylor Blau @ 2023-11-30 19:19 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhfXoovgYzIYE0@tanuki>

On Thu, Nov 30, 2023 at 11:18:37AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 28, 2023 at 02:08:05PM -0500, Taylor Blau wrote:
> > When selecting which packfiles will be written while generating a MIDX,
> > the MIDX internals fill out a 'struct pack_info' with various pieces of
> > book-keeping.
> >
> > Instead of filling out each field of the `pack_info` structure
> > individually in each of the two spots that modify the array of such
> > structures (`ctx->info`), extract a common routine that does this for
> > us.
> >
> > This reduces the code duplication by a modest amount. But more
> > importantly, it zero-initializes the structure before assigning values
> > into it. This hardens us for a future change which will add additional
> > fields to this structure which (until this patch) was not
> > zero-initialized.
> >
> > As a result, any new fields added to the `pack_info` structure need only
> > be updated in a single location, instead of at each spot within midx.c.
> >
> > There are no functional changes in this patch.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  midx.c | 35 +++++++++++++++++++----------------
> >  1 file changed, 19 insertions(+), 16 deletions(-)
> >
> > diff --git a/midx.c b/midx.c
> > index 3b727dc633..591b3c636e 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -464,6 +464,17 @@ struct pack_info {
> >  	unsigned expired : 1;
> >  };
> >
> > +static void fill_pack_info(struct pack_info *info,
> > +			   struct packed_git *p, char *pack_name,
> > +			   uint32_t orig_pack_int_id)
> > +{
> > +	memset(info, 0, sizeof(struct pack_info));
> > +
> > +	info->orig_pack_int_id = orig_pack_int_id;
> > +	info->pack_name = pack_name;
> > +	info->p = p;
> > +}
>
> Nit: all callers manually call `xstrdup(pack_name)` and pass that to
> `fill_pack_info()`. We could consider doing this in here instead so that
> ownership of the string becomes a tad clearer.

That's a great idea. I think we'd also want to mark the pack_name
argument as const, not just because xstrdup() requires it, but also
because it communicates the ownership more clearly.

I'll squash something like this in:

--- >8 ---
diff --git a/midx.c b/midx.c
index b8b3f41024..6fb5e237b7 100644
--- a/midx.c
+++ b/midx.c
@@ -465,13 +465,13 @@ struct pack_info {
 };

 static void fill_pack_info(struct pack_info *info,
-			   struct packed_git *p, char *pack_name,
+			   struct packed_git *p, const char *pack_name,
 			   uint32_t orig_pack_int_id)
 {
 	memset(info, 0, sizeof(struct pack_info));

 	info->orig_pack_int_id = orig_pack_int_id;
-	info->pack_name = pack_name;
+	info->pack_name = xstrdup(pack_name);
 	info->p = p;
 }

@@ -557,8 +557,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
 			return;
 		}

-		fill_pack_info(&ctx->info[ctx->nr], p, xstrdup(file_name),
-			       ctx->nr);
+		fill_pack_info(&ctx->info[ctx->nr], p, file_name, ctx->nr);
 		ctx->nr++;
 	}
 }
@@ -1336,7 +1335,7 @@ static int write_midx_internal(const char *object_dir,
 			}

 			fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i],
-				       xstrdup(ctx.m->pack_names[i]), i);
+				       ctx.m->pack_names[i], i);
 		}
 	}
--- 8< ---

> > -		if (open_pack_index(ctx->info[ctx->nr].p)) {
> > +		if (open_pack_index(p)) {
> >  			warning(_("failed to open pack-index '%s'"),
> >  				full_path);
> >  			close_pack(ctx->info[ctx->nr].p);
>
> Isn't `ctx->info[ctx->nr].p` still uninitialized at this point?

Great catch, thank you!

> > @@ -1330,10 +1333,10 @@ static int write_midx_internal(const char *object_dir,
> >  				if (open_pack_index(ctx.m->packs[i]))
> >  					die(_("could not open index for %s"),
> >  					    ctx.m->packs[i]->pack_name);
> > -				ctx.info[ctx.nr].p = ctx.m->packs[i];
>
> Just to make sure I'm not missing anything, but this assignment here was
> basically redundant before this patch already, right?

I think that's right, but in either case we're assigning the pack once
at the end of each loop iteration via a single call to fill_pack_info().
Since we're using ctx.m->packs[i] in both places (after a call to
prepare_midx_pack()), we should be OK here.

Thanks,
Taylor

^ permalink raw reply related

* Re: [PATCH 02/24] pack-bitmap-write: deep-clear the `bb_commit` slab
From: Taylor Blau @ 2023-11-30 19:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhd5gLzYwPEgBl@tanuki>

On Thu, Nov 30, 2023 at 11:18:31AM +0100, Patrick Steinhardt wrote:
> > diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> > index f4ecdf8b0e..dd3a415b9d 100644
> > --- a/pack-bitmap-write.c
> > +++ b/pack-bitmap-write.c
> > @@ -198,6 +198,13 @@ struct bb_commit {
> >  	unsigned idx; /* within selected array */
> >  };
> >
> > +static void clear_bb_commit(struct bb_commit *commit)
> > +{
> > +	free(commit->reverse_edges);
>
> I'd have expected to see `free_commit_list()` here instead of a simple
> free. Is there any reason why we don't use it?

Thanks for spotting an oversight on my part. We should definitely be
using free_commit_list() here instead of a bare free() to avoid leaking
the tail.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 01/24] pack-objects: free packing_data in more places
From: Taylor Blau @ 2023-11-30 19:08 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhcv25B5nUNoyu@tanuki>

On Thu, Nov 30, 2023 at 11:18:26AM +0100, Patrick Steinhardt wrote:
> > diff --git a/pack-objects.c b/pack-objects.c
> > index f403ca6986..1c7bedcc94 100644
> > --- a/pack-objects.c
> > +++ b/pack-objects.c
> > @@ -151,6 +151,21 @@ void prepare_packing_data(struct repository *r, struct packing_data *pdata)
> >  	init_recursive_mutex(&pdata->odb_lock);
> >  }
> >
> > +void free_packing_data(struct packing_data *pdata)
>
> Nit: shouldn't this rather be called `clear_packing_data`? `free` to me
> indicates that the data structure itself will be free'd, as well, which
> is not the case.

Thanks, that's a good suggestion. I've made the changes locally and will
include it in the subsequent round(s) ;-).

Thanks,
Taylor

^ permalink raw reply

* Re: Consider dropping the decimal places for KiB/s 52.00 KiB/s
From: Taylor Blau @ 2023-11-30 19:07 UTC (permalink / raw)
  To: Jonny Grant; +Cc: git
In-Reply-To: <637be919-0b04-4e5c-8f2e-43340521e6d1@jguk.org>

On Thu, Nov 30, 2023 at 06:11:57PM +0000, Jonny Grant wrote:
> Hello
>
> May I suggest taking off the .00 KiB/s suffix, has that been
> considered? As the decimal places don't appear to change, they're
> stuck on .00.

I wonder if you have a throttled connection that is locked to 52KiB/s
exactly. The progress code that generates the throughput is in
progress.c::display_throughput(), which computes the rate. It's computed
in bytes/misec, and then passed to throughput_string() (really,
`strbuf_humanise_rate()`), which formats it appropriately.

If you're in the KiB range, it will print the decimal component, which
is:

    ((bytes & ((1<<10)-1)) * 100) >> 10

> $ git clone git://gcc.gnu.org/git/gcc.git git_1
> Cloning into 'git_1'...
> remote: Enumerating objects: 2949348, done.
> remote: Counting objects: 100% (209238/209238), done.
> remote: Compressing objects: 100% (14579/14579), done.
> Receiving objects:   7% (210878/2949348), 76.33 MiB | 52.00 KiB/s

On my machine:

    $ git.compile clone git://gcc.gnu.org/git/gcc.git
    [...]
    Receiving objects:  11% (342176/2949348), 108.09 MiB | 24.01 MiB/s

I suppose we could consider dropping the decimal component if it's a
round number, but I think that it may produce awkward flickering if the
rate oscillates between a round number and a non-round number.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 1/1] t2400: avoid using pipes
From: Christian Couder @ 2023-11-30 18:16 UTC (permalink / raw)
  To: Achu Luma; +Cc: git
In-Reply-To: <20231130165429.2595-2-ach.lumap@gmail.com>

Hi Luma,

On Thu, Nov 30, 2023 at 6:37 PM Achu Luma <ach.lumap@gmail.com> wrote:
>
> The exit code of the preceding command in a pipe is disregarded,
> so it's advisable to refrain from relying on it. Instead, by
> saving the output of a Git command to a file, we gain the
> ability to examine the exit codes of both commands separately.
>
> Signed-off-by: achluma <ach.lumap@gmail.com>

I think the issue with merging your patch (in
https://lore.kernel.org/git/xmqqedibzgi1.fsf@gitster.g/) was that this
"Signed-off-by: ..." line didn't show your full real name and didn't
match your name in your email address.

Assuming that "Achu Luma" is your full real name, you should replace
"achluma" with "Achu Luma" in the "Signed-off-by: ..." line.

Also it's better not to send a cover letter patch like
https://lore.kernel.org/git/20231130165429.2595-1-ach.lumap@gmail.com/
with no content for small patches like this.

When you resend, please also make sure to use [Outreachy] in the patch
subject and to increment the version number of the patch, using for
example "[PATCH v3]".

It would be nice too if after the line starting with --- below, you
could describe in a few lines the changes in the new version of the
patch compared to the previous version.

> ---

Here (after the line starting with --- above) is the place where you
can tell what changed in the patch compared to the previous version.

Note that when there is a cover letter patch, it's better to talk
about changes in the new version in the cover letter, but I dont think
it's worth sending a cover letter patch.

Thanks,
Christian.

>  t/t2400-worktree-add.sh | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
> index df4aff7825..7ead05bb98 100755
> --- a/t/t2400-worktree-add.sh
> +++ b/t/t2400-worktree-add.sh
> @@ -468,7 +468,8 @@ test_expect_success 'put a worktree under rebase' '
>                 cd under-rebase &&
>                 set_fake_editor &&
>                 FAKE_LINES="edit 1" git rebase -i HEAD^ &&
> -               git worktree list | grep "under-rebase.*detached HEAD"
> +               git worktree list >actual &&
> +               grep "under-rebase.*detached HEAD" actual
>         )
>  '
>
> @@ -509,7 +510,8 @@ test_expect_success 'checkout a branch under bisect' '
>                 git bisect start &&
>                 git bisect bad &&
>                 git bisect good HEAD~2 &&
> -               git worktree list | grep "under-bisect.*detached HEAD" &&
> +               git worktree list >actual &&
> +               grep "under-bisect.*detached HEAD" actual &&
>                 test_must_fail git worktree add new-bisect under-bisect &&
>                 ! test -d new-bisect
>         )
> --
> 2.41.0.windows.1
>
>

^ permalink raw reply

* Consider dropping the decimal places for KiB/s 52.00 KiB/s
From: Jonny Grant @ 2023-11-30 18:11 UTC (permalink / raw)
  To: git

Hello

May I suggest taking off the .00 KiB/s suffix, has that been considered? As the decimal places don't appear to change, they're stuck on .00.

I noticed the KiB/s is neatly rounded to the kibibyte.

Apologies I'm not using latest release. 
git version 2.40.1

$ git clone git://gcc.gnu.org/git/gcc.git git_1
Cloning into 'git_1'...
remote: Enumerating objects: 2949348, done.
remote: Counting objects: 100% (209238/209238), done.
remote: Compressing objects: 100% (14579/14579), done.
Receiving objects:   7% (210878/2949348), 76.33 MiB | 52.00 KiB/s

Meanwhile I'll keep waiting for this slow server.

Kind regards, Jonny

^ permalink raw reply

* [PATCH v2 1/1] t2400: avoid using pipes
From: Achu Luma @ 2023-11-30 16:54 UTC (permalink / raw)
  To: git; +Cc: Achu Luma
In-Reply-To: <20231130165429.2595-1-ach.lumap@gmail.com>

The exit code of the preceding command in a pipe is disregarded,
so it's advisable to refrain from relying on it. Instead, by
saving the output of a Git command to a file, we gain the
ability to examine the exit codes of both commands separately.

Signed-off-by: achluma <ach.lumap@gmail.com>
---
 t/t2400-worktree-add.sh | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
index df4aff7825..7ead05bb98 100755
--- a/t/t2400-worktree-add.sh
+++ b/t/t2400-worktree-add.sh
@@ -468,7 +468,8 @@ test_expect_success 'put a worktree under rebase' '
 		cd under-rebase &&
 		set_fake_editor &&
 		FAKE_LINES="edit 1" git rebase -i HEAD^ &&
-		git worktree list | grep "under-rebase.*detached HEAD"
+		git worktree list >actual && 
+		grep "under-rebase.*detached HEAD" actual
 	)
 '
 
@@ -509,7 +510,8 @@ test_expect_success 'checkout a branch under bisect' '
 		git bisect start &&
 		git bisect bad &&
 		git bisect good HEAD~2 &&
-		git worktree list | grep "under-bisect.*detached HEAD" &&
+		git worktree list >actual && 
+		grep "under-bisect.*detached HEAD" actual &&
 		test_must_fail git worktree add new-bisect under-bisect &&
 		! test -d new-bisect
 	)
-- 
2.41.0.windows.1


^ permalink raw reply related

* Re: [PATCH 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Taylor Blau @ 2023-11-30 17:36 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, hanwenn
In-Reply-To: <ZWg8-HW6yCa6-tnS@tanuki>

On Thu, Nov 30, 2023 at 08:42:48AM +0100, Patrick Steinhardt wrote:
> > Are there any other gotchas that we should be thinking about?
>
> Not that I can think of. As you say, a repository with malformed HEAD
> will run into other problems anyway. And `read_ref_full()` would return
> errors if these refs were malformed, which would cause us to exit early
> from anyway. So unless "rebase-merge/amend" and "rebase-merge/orig-head"
> contained the same kind of garbage we'd retain the same behaviour as
> before, and that shouldn't really be happening.
>
> One interesting bit is that we don't set `RESOLVE_REF_READING`, so
> `read_ref_full()` may return successfully even if the ref doesn't exist.
> But in practice this is fine given that the resulting oid would be
> cleared in that case.

Thanks for thinking through these. I agree with your reasoning and think
that this is fine as-is.

(Off-topic, but can you please trim your replies to only include the
quoted parts/context that you're replying to?)

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 0/4] refs: improve handling of special refs
From: Taylor Blau @ 2023-11-30 17:35 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, hanwenn
In-Reply-To: <ZWg97gfn80jeB3-_@tanuki>

On Thu, Nov 30, 2023 at 08:46:54AM +0100, Patrick Steinhardt wrote:
> On Wed, Nov 29, 2023 at 05:14:39PM -0500, Taylor Blau wrote:
> > These all look pretty good to me. I had a few minor questions on the
> > first three patches, but I don't think they necessarily require a
> > reroll.
>
> I agree that none of the comments really require a reroll, but I'll
> address them if there will be another iteration of this patch series.
>
> Thanks for your review!

No problem on either. I doubt that there will be another iteration of
this series since it is already good, so no need to worry too much about
these changes.

Thanks,
Taylor

^ permalink raw reply

* [PATCH v2 0/1] *** Avoid using Pipes ***
From: Achu Luma @ 2023-11-30 16:54 UTC (permalink / raw)
  To: git; +Cc: Achu Luma
In-Reply-To: <20231003174853.1732-1-ach.lumap@gmail.com>

*** BLURB HERE ***

Achu Luma (1):
  t2400: avoid using pipes

 t/t2400-worktree-add.sh | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


base-commit: d0e8084c65cbf949038ae4cc344ac2c2efd77415
-- 
2.41.0.windows.1


^ permalink raw reply

* [PATCH v2] hooks--pre-commit: detect non-ASCII when renaming
From: Julian Prein via GitGitGadget @ 2023-11-30 16:13 UTC (permalink / raw)
  To: git; +Cc: Julian Prein, Julian Prein
In-Reply-To: <pull.1291.git.git.1657837073372.gitgitgadget@gmail.com>

From: Julian Prein <druckdev@protonmail.com>

When diff.renames is turned on, the diff-filter will not return renamed
files (or copied ones with diff.renames=copy) and potential non-ASCII
characters would not be caught by this hook.

Use the plumbing command diff-index instead of the porcelain one to not
be affected by diff.rename.

Signed-off-by: Julian Prein <druckdev@protonmail.com>
---
    hooks--pre-commit: detect non-ASCII when renaming
    
    A bit later than I expected, but here is v2.
    
    Changes since v1:
    
     * Switched to using diff-index and back to just the A filter as
       suggested by Junio C Hamano

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1291%2Fdruckdev%2Fpre-commit-renames-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1291/druckdev/pre-commit-renames-v2
Pull-Request: https://github.com/git/git/pull/1291

Range-diff vs v1:

 1:  101f327a040 ! 1:  1f6ca0dd3eb hooks--pre-commit: detect non-ASCII when renaming
     @@ Metadata
       ## Commit message ##
          hooks--pre-commit: detect non-ASCII when renaming
      
     -    Currently the diff-filter that is used to check for non-ASCII characters
     -    in filenames only checks new additions.
     +    When diff.renames is turned on, the diff-filter will not return renamed
     +    files (or copied ones with diff.renames=copy) and potential non-ASCII
     +    characters would not be caught by this hook.
      
     -    Extend the diff-filter in the pre-commit sample to include `CR` as well.
     -    This way non-ASCII character in filenames are detected on a rename/copy
     -    as well.
     +    Use the plumbing command diff-index instead of the porcelain one to not
     +    be affected by diff.rename.
      
          Signed-off-by: Julian Prein <druckdev@protonmail.com>
      
     @@ templates/hooks--pre-commit.sample: if [ "$allownonascii" != "true" ] &&
       	# even required, for portability to Solaris 10's /usr/bin/tr), since
       	# the square bracket bytes happen to fall in the designated range.
      -	test $(git diff --cached --name-only --diff-filter=A -z $against |
     -+	test $(git diff --cached --name-only --diff-filter=ACR -z $against |
     ++	test $(git diff-index --cached --name-only --diff-filter=A -z $against |
       	  LC_ALL=C tr -d '[ -~]\0' | wc -c) != 0
       then
       	cat <<\EOF


 templates/hooks--pre-commit.sample | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/templates/hooks--pre-commit.sample b/templates/hooks--pre-commit.sample
index e144712c85c..29ed5ee486a 100755
--- a/templates/hooks--pre-commit.sample
+++ b/templates/hooks--pre-commit.sample
@@ -28,7 +28,7 @@ if [ "$allownonascii" != "true" ] &&
 	# Note that the use of brackets around a tr range is ok here, (it's
 	# even required, for portability to Solaris 10's /usr/bin/tr), since
 	# the square bracket bytes happen to fall in the designated range.
-	test $(git diff --cached --name-only --diff-filter=A -z $against |
+	test $(git diff-index --cached --name-only --diff-filter=A -z $against |
 	  LC_ALL=C tr -d '[ -~]\0' | wc -c) != 0
 then
 	cat <<\EOF

base-commit: 61a22ddaf0626111193a17ac12f366bd6d167dff
-- 
gitgitgadget

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox