Git development
 help / color / mirror / Atom feed
* Re: [PATCH 05/24] midx: implement `DISP` chunk
From: Taylor Blau @ 2023-11-30 19:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhhaXiF_wC3er7@tanuki>

On Thu, Nov 30, 2023 at 11:18:45AM +0100, Patrick Steinhardt wrote:
> > diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
> > index 9fcb29a9c8..658682ddd5 100644
> > --- a/Documentation/gitformat-pack.txt
> > +++ b/Documentation/gitformat-pack.txt
> > @@ -396,6 +396,22 @@ CHUNK DATA:
> >  	    is padded at the end with between 0 and 3 NUL bytes to make the
> >  	    chunk size a multiple of 4 bytes.
> >
> > +	Disjoint Packfiles (ID: {'D', 'I', 'S', 'P'})
> > +	    Stores a table of three 4-byte unsigned integers in network order.
> > +	    Each table entry corresponds to a single pack (in the order that
> > +	    they appear above in the `PNAM` chunk). The values for each table
> > +	    entry are as follows:
> > +	    - The first bit position (in psuedo-pack order, see below) to
>
> s/psuedo/pseudo/

Good catch, thanks. Not sure how that escaped my spell-checker...

> > +=== `DISP` chunk and disjoint packs
> > +
> > +The Disjoint Packfiles (`DISP`) chunk encodes additional information
> > +about the objects in the multi-pack index's reachability bitmap. Recall
> > +that objects from the MIDX are arranged in "pseudo-pack" order (see:
>
> The colon feels a bit out-of-place here, so: s/see:/see/

Thanks, I'll fix that up.

> > +above) for reachability bitmaps.
> > +
> > +From the example above, suppose we have packs "a", "b", and "c", with
> > +10, 15, and 20 objects, respectively. In pseudo-pack order, those would
> > +be arranged as follows:
> > +
> > +    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
> > +
> > +When working with single-pack bitmaps (or, equivalently, multi-pack
> > +reachability bitmaps without any packs marked as disjoint),
> > +linkgit:git-pack-objects[1] performs ``verbatim'' reuse, attempting to
> > +reuse chunks of the existing packfile instead of adding objects to the
> > +packing list.
>
> I'm not sure I full understand this paragraph. In the context of a
> single pack bitmap it's clear enough. But I stumbled over the MIDX case,
> because here we potentially have multiple packfiles, so it's not exactly
> clear to me what you refer to with "the existing packfile" in that case.
> I'd think that we perform verbatim reuse of the preferred packfile,
> right? If so, we might want to make that a bit more explicit.

Yep, sorry, I can see how that would be confusing. Since we're talking
about the existing behavior at this point in the series (before
multi-pack reuse is implemented), I changed this to:

  "reuse chunks of the bitmapped or preferred packfile [...]"

Thanks for carefully reading and spotting my errors ;-).

> > +object. This introduces an additional constraint over the set of packs
> > +we may want to reuse. The most straightforward approach is to mandate
> > +that the set of packs is disjoint with respect to the set of objects
> > +contained in each pack. In other words, for each object `o` in the union
> > +of all objects stored by the disjoint set of packs, `o` is contained in
> > +exactly one pack from the disjoint set.
>
> Is this a property that usually holds for our normal housekeeping, or
> does it require careful managing by the user/admin? How about geometric
> repacking?

At this point in the series, it would require careful managing to ensure
that this is the case. In practice MIDX'd packs generated with a
geometric repack are mostly disjoint, but definitely not guaranteed to
be.

Further down in this series we'll introduce new options to generate
packs which are guaranteed to be disjoint with respect to the
currently-marked set of packs in the DISP chunk.

> > @@ -764,14 +807,22 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
> >  		 * Take only the first duplicate.
> >  		 */
> >  		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
> > -			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
> > -						&fanout.entries[cur_object].oid))
> > -				continue;
> > +			struct pack_midx_entry *ours = &fanout.entries[cur_object];
> > +			if (cur_object) {
> > +				struct pack_midx_entry *prev = &fanout.entries[cur_object - 1];
> > +				if (oideq(&prev->oid, &ours->oid)) {
> > +					if (prev->disjoint && ours->disjoint)
> > +						die(_("duplicate object '%s' among disjoint packs '%s', '%s'"),
> > +						    oid_to_hex(&prev->oid),
> > +						    info[prev->pack_int_id].pack_name,
> > +						    info[ours->pack_int_id].pack_name);
>
> Shouldn't we die if `prev->disjoint || ours->disjoint` instead of `&&`?
> Even if one of the packs isn't marked as disjoint, it's still wrong if
> the other one is and one of its objects exists in multiple packs.
>
> Or am I misunderstanding, and we only guarantee the disjoint property
> across packfiles that are actually marked as such?

Right, we only guarantee disjointed-ness among the set of packs that are
marked disjoint. It's fine for the same object to appear in a disjoint
and non-disjoint pack, and for both of those packs to end up in the
MIDX. But that is only because we'll use the disjoint copy in our
bitmap.

If there were two packs that are marked as supposedly disjoint, but
contain at least one duplicate of an object, then we will reject those
packs as non-disjoint.

Thanks,
Taylor

^ permalink raw reply

* Re: Consider dropping the decimal places for KiB/s 52.00 KiB/s
From: Dragan Simic @ 2023-11-30 19:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonny Grant, git
In-Reply-To: <ZWjdatp3SRb4mN6G@nand.local>

On 2023-11-30 20:07, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 06:11:57PM +0000, Jonny Grant wrote:
>> $ git clone git://gcc.gnu.org/git/gcc.git git_1
>> Cloning into 'git_1'...
>> remote: Enumerating objects: 2949348, done.
>> remote: Counting objects: 100% (209238/209238), done.
>> remote: Compressing objects: 100% (14579/14579), done.
>> Receiving objects:   7% (210878/2949348), 76.33 MiB | 52.00 KiB/s
> 
> On my machine:
> 
>     $ git.compile clone git://gcc.gnu.org/git/gcc.git
>     [...]
>     Receiving objects:  11% (342176/2949348), 108.09 MiB | 24.01 MiB/s
> 
> I suppose we could consider dropping the decimal component if it's a
> round number, but I think that it may produce awkward flickering if the
> rate oscillates between a round number and a non-round number.

You're right, the resulting flickering would look really annoying.  In 
fact, I already modified the reported download speed in another project 
to avoid pretty much the same kind of flickering, and it looked much 
better without it.

^ permalink raw reply

* Re: [PATCH 07/24] midx: implement `--retain-disjoint` mode
From: Taylor Blau @ 2023-11-30 19:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhi15VpeCRflDB@tanuki>

On Thu, Nov 30, 2023 at 11:18:51AM +0100, Patrick Steinhardt wrote:
> > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> > index d130e65b28..ac0c7b124b 100644
> > --- a/Documentation/git-multi-pack-index.txt
> > +++ b/Documentation/git-multi-pack-index.txt
> > @@ -54,6 +54,14 @@ write::
> >  		"disjoint". See the "`DISP` chunk and disjoint packs"
> >  		section in linkgit:gitformat-pack[5] for more.
> >
> > +	--retain-disjoint::
> > +		When writing a multi-pack index with a reachability
> > +		bitmap, keep any packs marked as disjoint in the
> > +		existing MIDX (if any) as such in the new MIDX. Existing
> > +		disjoint packs which are removed (e.g., not listed via
> > +		`--stdin-packs`) are ignored. This option works in
> > +		addition to the '+' marker for `--stdin-packs`.
>
> I'm trying to understand when you're expected to pass this flag and when
> you're expected not to pass it. This documentation could also help in
> the documentation here so that the user can make a more informed
> decision.

I think there are multiple reasons that you may or may not want to pass
that flag. Certainly if you're not using disjoint packs (and instead
only care about single-pack verbatim reuse over the MIDX's preferred
packfile), then you don't need to pass it.

But if you are using disjoint packs, you may want to pass it if you are
adding packs to the MIDX which are disjoint, _and_ you want to hold onto
the existing set of disjoint packs.

But if you want to change the set of disjoint packs entirely, you would
want to omit this flag (unless you knew a-priori that you were going to
drop all of the currently marked disjoint packs from the new MIDX you
are writing, e.g. with --stdin-packs).

If you think it would be useful, I could try and distill some of this
down, but I think that there is likely too much detail here for it to be
useful in user-facing documentation.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 08/24] pack-objects: implement `--ignore-disjoint` mode
From: Taylor Blau @ 2023-11-30 19:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhhkdnVZ9w7tDBv@tanuki>

On Thu, Nov 30, 2023 at 11:18:57AM +0100, Patrick Steinhardt wrote:
> > Instead, teach `pack-objects` a special `--ignore-disjoint` which is the
> > moral equivalent of marking the set of disjoint packs as kept, and
> > ignoring their contents, even if it would have otherwise been packed. In
> > fact, this similarity extends down to the implementation, where each
> > disjoint pack is first loaded, then has its `pack_keep_in_core` bit set.
> >
> > With this in place, we can use the kept-pack cache from 20b031fede
> > (packfile: add kept-pack cache for find_kept_pack_entry(), 2021-02-22),
> > which looks up objects first in a cache containing just the set of kept
> > (in this case, disjoint) packs. Assuming that the set of disjoint packs
> > is a relatively small portion of the entire repository (which should be
> > a safe assumption to make), each object lookup will be very inexpensive.
>
> This cought me by surprise a bit. I'd have expected that in the end,
> most of the packfiles in a repository would be disjoint. Using for
> example geometric repacks, my expectation was that all of the packs that
> get written via geometric repacking would eventually become disjoint
> whereas new packs added to the repository would initially not be.

Which part are you referring to here? If you're referring to the part
where I say that the set of disjoint packs is relatively small in
proposition to the rest of the packs, I think I know where the confusion
is.

I'm not saying that the set of disjoint packs is small in comparison to
the rest of the repository by object count, but rather by count of packs
overall. You're right that packs from pushes will not be guaranteed to
be disjoint upon entering the repository, but will become disjoint when
geometrically repacked (assuming that the caller uses --ignore-disjoint
when repacking).

Thanks,
Taylor

^ permalink raw reply

* Minor UX annoyance w/`git add --patch untracked/file`
From: Vito Caputo @ 2023-11-30 19:26 UTC (permalink / raw)
  To: git

Hello list,

Couldn't the following two steps be done automagically by --patch:

```
git add -N path/to/untracked/file/wishing/to/partially/add
git add --patch path/to/untracked/file/wishing/to/partially/add
```

when one simply does:

`git add --patch path/to/untracked/file/wishing/to/partially/add`

?

Cheers,
Vito Caputo

^ permalink raw reply

* Re: [PATCH 00/24] pack-objects: multi-pack verbatim reuse
From: Taylor Blau @ 2023-11-30 19:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWhha2h2zzZWCXrw@tanuki>

On Thu, Nov 30, 2023 at 11:18:19AM +0100, Patrick Steinhardt wrote:
> > Performing verbatim pack reuse naturally trades off between CPU time and
> > the resulting pack size. In the above example, the single-pack reuse
> > case produces a clone size of ~194 MB on my machine, while the
> > multi-pack reuse case produces a clone size closer to ~266 MB, which is
> > a ~37% increase in clone size.
>
> Quite exciting, and a tradeoff that may be worth it for Git hosters. I
> expect that this is going to be an extreme example of the benefits
> provided by your patch series -- do you by any chance also have "real"
> numbers that make it possible to quantify the effect a bit better?
>
> No worry if you don't, I'm just curious.

I don't have a great sense, no. I haven't run these patches yet in
production, although would like to do so soon for internal repositories
to get a better sense here.

There are some performance tests at the end which try and give you a
sense of at least the relative speed-up depending on how many disjoint
packs you have (IIRC, we test for 1, 10, and 100 disjoint packs).

> > I think there is still some opportunity to close this gap, since the
> > "packing" strategy here is extremely naive. In a production setting, I'm
> > sure that there are more well thought out repacking strategies that
> > would produce more similar clone sizes.
> >
> > I considered breaking this series up into smaller chunks, but was
> > unsatisfied with the result. Since this series is rather large, if you
> > have alternate suggestions on better ways to structure this, please let
> > me know.
>
> The series is indeed very involved to review. I only made it up to patch
> 8/24 and already spent quite some time on it. So I'd certainly welcome
> it if this was split up into smaller parts, but don't have a suggestion
> as to how this should be done (also because I didn't yet read the other
> 16 patches).

I suppose that one way to break it up might be:

    pack-objects: free packing_data in more places
    pack-bitmap-write: deep-clear the `bb_commit` slab
    pack-bitmap: plug leak in find_objects()

    midx: factor out `fill_pack_info()`
    midx: implement `DISP` chunk
    midx: implement `midx_locate_pack()`
    midx: implement `--retain-disjoint` mode

    pack-objects: implement `--ignore-disjoint` mode
    repack: implement `--extend-disjoint` mode

    pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
    pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature
    pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`
    pack-objects: parameterize pack-reuse routines over a single pack
    pack-objects: keep track of `pack_start` for each reuse pack
    pack-objects: pass `bitmapped_pack`'s to pack-reuse functions
    pack-objects: prepare `write_reused_pack()` for multi-pack reuse
    pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse
    pack-objects: include number of packs reused in output
    pack-bitmap: prepare to mark objects from multiple packs for reuse

    pack-objects: add tracing for various packfile metrics
    t/test-lib-functions.sh: implement `test_trace2_data` helper
    pack-objects: allow setting `pack.allowPackReuse` to "single"
    pack-bitmap: reuse objects from all disjoint packs
    t/perf: add performance tests for multi-pack reuse

Then you'd have five patch series, where each series does roughly the
following:

  1. Preparatory clean-up.
  2. Implementing the DISP chunk, as well as --retain-disjoint, without
     a way to generate such packs.
  3. Implement a way to generate such packs, but without actually being
     able to reuse more than one of them.
  4. Implement multi-pack reuse, but without actually reusing any packs.
  5. Enable multi-pack reuse (and implement the required scaffolding to
     do so), and test it.

That's the most sensible split that I could come up with, at least. But
I still find it relatively unsatisfying for a couple of reasons:

  - With the exception of the last group of patches, none of the earlier
    series enable any new, useful behavior on their own. IOW, if we just
    merged the first three series and then forgot about this topic, we
    wouldn't have done anything useful ;-).

  - The fourth series (which actually implements multi-pack reuse) still
    remains the most complicated, and would likely be the most difficult
    to review. So I think you'd still have one difficult series to
    review, plus four other series which are slightly less difficult to
    review ;-).

It's entirely possible that I'm just too close to these patches to see a
better split, so if you (or others) have any suggestions on how to break
this up, please don't hesitate to share them.

> I'll review the remaining patches at a later point in time.

Thanks, I'll look forward to more of your review as usual :-).

Thanks,
Taylor

^ permalink raw reply

* Re: Consider dropping the decimal places for KiB/s 52.00 KiB/s
From: Jonny Grant @ 2023-11-30 20:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZWjdatp3SRb4mN6G@nand.local>



On 30/11/2023 19:07, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 06:11:57PM +0000, Jonny Grant wrote:
>> Hello
>>
>> May I suggest taking off the .00 KiB/s suffix, has that been
>> considered? As the decimal places don't appear to change, they're
>> stuck on .00.
> 
> I wonder if you have a throttled connection that is locked to 52KiB/s
> exactly.

You're right - I changed to https and it went to 6 MiB/s. Seems throttled on git://
I should have checked that first.

>  The progress code that generates the throughput is in
> progress.c::display_throughput(), which computes the rate. It's computed
> in bytes/misec, and then passed to throughput_string() (really,
> `strbuf_humanise_rate()`), which formats it appropriately.
> 
> If you're in the KiB range, it will print the decimal component, which
> is:
> 
>     ((bytes & ((1<<10)-1)) * 100) >> 10
> 
>> $ git clone git://gcc.gnu.org/git/gcc.git git_1
>> Cloning into 'git_1'...
>> remote: Enumerating objects: 2949348, done.
>> remote: Counting objects: 100% (209238/209238), done.
>> remote: Compressing objects: 100% (14579/14579), done.
>> Receiving objects:   7% (210878/2949348), 76.33 MiB | 52.00 KiB/s
> 
> On my machine:
> 
>     $ git.compile clone git://gcc.gnu.org/git/gcc.git
>     [...]
>     Receiving objects:  11% (342176/2949348), 108.09 MiB | 24.01 MiB/s
> 
> I suppose we could consider dropping the decimal component if it's a
> round number, but I think that it may produce awkward flickering if the
> rate oscillates between a round number and a non-round number.

Now I can see the advantage of it as it is. wget is similar.
Kind regards, Jonny



^ permalink raw reply

* [PATCH 0/2] completion: refactor and support reftables backend
From: Stan Hu @ 2023-11-30 20:24 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu

Hi,

This patch series refactors and updates git-completion.bash to become
compatible with the upcoming reftables backend.

Stan Hu (2):
  completion: refactor existence checks for special refs
  completion: stop checking for reference existence via `test -f`

 contrib/completion/git-completion.bash | 43 +++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 5 deletions(-)

-- 
2.42.0


^ permalink raw reply

* [PATCH 1/2] completion: refactor existence checks for special refs
From: Stan Hu @ 2023-11-30 20:24 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu
In-Reply-To: <20231130202404.89791-1-stanhu@gmail.com>

In preparation for the reftable backend, this commit introduces a
'__git_ref_exists' function that continues to use 'test -f' to
determine whether a given ref exists in the local filesystem.

Each caller of '__git_ref_exists' must call '__git_find_repo_path`
first. '_git_restore' already used 'git rev-parse HEAD', but to use
'__git_ref_exists' insert a '__git_find_repo_path' at the start.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Stan Hu <stanhu@gmail.com>
---
 contrib/completion/git-completion.bash | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 13a39ebd2e..9fbdc13f9a 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -122,6 +122,15 @@ __git ()
 		${__git_dir:+--git-dir="$__git_dir"} "$@" 2>/dev/null
 }
 
+# Runs git in $__git_repo_path to determine whether a ref exists.
+# 1: The ref to search
+__git_ref_exists ()
+{
+	local ref=$1
+
+	[ -f "$__git_repo_path/$ref" ]
+}
+
 # Removes backslash escaping, single quotes and double quotes from a word,
 # stores the result in the variable $dequoted_word.
 # 1: The word to dequote.
@@ -1625,7 +1634,7 @@ __git_cherry_pick_inprogress_options=$__git_sequencer_inprogress_options
 _git_cherry_pick ()
 {
 	__git_find_repo_path
-	if [ -f "$__git_repo_path"/CHERRY_PICK_HEAD ]; then
+	if __git_ref_exists CHERRY_PICK_HEAD; then
 		__gitcomp "$__git_cherry_pick_inprogress_options"
 		return
 	fi
@@ -2067,7 +2076,7 @@ _git_log ()
 	__git_find_repo_path
 
 	local merge=""
-	if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
+	if __git_ref_exists MERGE_HEAD; then
 		merge="--merge"
 	fi
 	case "$prev,$cur" in
@@ -2934,6 +2943,7 @@ _git_reset ()
 
 _git_restore ()
 {
+	__git_find_repo_path
 	case "$prev" in
 	-s)
 		__git_complete_refs
@@ -2952,7 +2962,7 @@ _git_restore ()
 		__gitcomp_builtin restore
 		;;
 	*)
-		if __git rev-parse --verify --quiet HEAD >/dev/null; then
+		if __git_ref_exists HEAD; then
 			__git_complete_index_file "--modified"
 		fi
 	esac
@@ -2963,7 +2973,7 @@ __git_revert_inprogress_options=$__git_sequencer_inprogress_options
 _git_revert ()
 {
 	__git_find_repo_path
-	if [ -f "$__git_repo_path"/REVERT_HEAD ]; then
+	if __git_ref_exists REVERT_HEAD; then
 		__gitcomp "$__git_revert_inprogress_options"
 		return
 	fi
@@ -3592,7 +3602,7 @@ __gitk_main ()
 	__git_find_repo_path
 
 	local merge=""
-	if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
+	if __git_ref_exists MERGE_HEAD; then
 		merge="--merge"
 	fi
 	case "$cur" in
-- 
2.42.0


^ permalink raw reply related

* [PATCH 2/2] completion: stop checking for reference existence via `test -f`
From: Stan Hu @ 2023-11-30 20:24 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Christian Couder, Stan Hu
In-Reply-To: <20231130202404.89791-2-stanhu@gmail.com>

In contrib/completion/git-completion.bash, there are a bunch of
instances where we read special refs like HEAD, MERGE_HEAD,
REVERT_HEAD, and others via the filesystem. However, the upcoming
reftable refs backend won't use '.git/HEAD' at all but instead will
write an invalid refname as placeholder for backwards compatibility,
which will break the git-completion script.

Update the '__git_ref_exists' function to:

1. Recognize the placeholder '.git/HEAD' written by the reftable
   backend (its content is specified in the reftable specs).
2. If reftable is in use, use 'git rev-parse' to determine whether the
    given ref exists.
3. Otherwise, continue to use 'test -f' to check for the ref's filename.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Stan Hu <stanhu@gmail.com>
---
 contrib/completion/git-completion.bash | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 9fbdc13f9a..f5b630ba99 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -122,12 +122,35 @@ __git ()
 		${__git_dir:+--git-dir="$__git_dir"} "$@" 2>/dev/null
 }
 
+# Helper function to read the first line of a file into a variable.
+# __git_eread requires 2 arguments, the file path and the name of the
+# variable, in that order.
+#
+# This is taken from git-prompt.sh.
+__git_eread ()
+{
+	test -r "$1" && IFS=$'\r\n' read -r "$2" <"$1"
+}
+
 # Runs git in $__git_repo_path to determine whether a ref exists.
 # 1: The ref to search
 __git_ref_exists ()
 {
 	local ref=$1
 
+	# If the reftable is in use, we have to shell out to 'git rev-parse'
+	# to determine whether the ref exists instead of looking directly in
+	# the filesystem to determine whether the ref exists. Otherwise, use
+	# Bash builtins since executing Git commands are expensive on some
+	# platforms.
+	if __git_eread "$__git_repo_path/HEAD" head; then
+		b="${head#ref: }"
+		if [ "$b" == "refs/heads/.invalid" ]; then
+			__git -C "$__git_repo_path" rev-parse --verify --quiet "$ref" 2>/dev/null
+			return $?
+		fi
+	fi
+
 	[ -f "$__git_repo_path/$ref" ]
 }
 
-- 
2.42.0


^ permalink raw reply related

* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present
From: Christian Couder @ 2023-11-30 20:33 UTC (permalink / raw)
  To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore
In-Reply-To: <pull.1587.v5.git.1701206267300.gitgitgadget@gmail.com>

On Tue, Nov 28, 2023 at 10:17 PM Zach FettersMoore via GitGitGadget
<gitgitgadget@gmail.com> wrote:

> To see this in practice you can use the open source GitHub repo
> 'apollo-ios-dev' and do the following in order:
>
> -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'

It looks like there is a spurious A after 'apollo-ios' in the line above.

>  directories
> -Create a commit containing these changes
> -Do a split on apollo-ios-codegen
>    - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

I might be doing something stupid or wrong, but I get the following:

$ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
fatal: could not rev-parse split hash
cc70a7d49e84696f0df210710445784c504ed748 from commit
360f068ea0d57f250621ab7dbe205313f52a0e98
hint: hash might be a tag, try fetching it from the subtree repository:
hint:    git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748

> -Do a split on apollo-ios
>    - git subtree split --prefix=apollo-ios --squash --rejoin

Same issue:

$ git subtree split --prefix=apollo-ios --squash --rejoin
fatal: could not rev-parse split hash
b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit
b48030c3eb6e2faf4bff981c5c63ca72aceecdfa
hint: hash might be a tag, try fetching it from the subtree repository:
hint:    git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111

I didn't try to get farther than this, as it seems that some
instructions might be missing.

[...]

> So this commit makes a change to the processing of commits for the
> split command in order to ignore non-mainline commits from other
> subtrees such as apollo-ios in the above breakdown by adding a new
> function 'should_ignore_subtree_commit' which is called during
> 'process_split_commit'. This allows the split/rejoin processing to
> still function as expected but removes all of the unnecessary
> processing that takes place currently which greatly inflates the
> processing time. In the above example, previously the final split
> would take ~10-12 minutes, while after this fix it takes seconds.

Nice!

Except for the above issues in the commit message, the rest of the
patch looks good to me, thanks!

^ permalink raw reply

* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present
From: Zach FettersMoore @ 2023-11-30 21:01 UTC (permalink / raw)
  To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git
In-Reply-To: <CAP8UFD1rd+q-dC_w2VgZ_jC++LDeF6gu5wDcbQzSuhU1ksfBpA@mail.gmail.com>

>> To see this in practice you can use the open source GitHub repo
>> 'apollo-ios-dev' and do the following in order:
>>
>> -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'

> It looks like there is a spurious A after 'apollo-ios' in the line above.

Thanks for catching that, definitely a typo on my part.

>> directories
>> -Create a commit containing these changes
>> -Do a split on apollo-ios-codegen
>> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

> I might be doing something stupid or wrong, but I get the following:
>
> $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> fatal: could not rev-parse split hash
> cc70a7d49e84696f0df210710445784c504ed748 from commit
> 360f068ea0d57f250621ab7dbe205313f52a0e98
> hint: hash might be a tag, try fetching it from the subtree repository:
> hint: git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748

Updated this to include doing a fetch to ensure all remote repo
info is available locally.

>> -Do a split on apollo-ios
>> - git subtree split --prefix=apollo-ios --squash --rejoin

> Same issue:
>
> $ git subtree split --prefix=apollo-ios --squash --rejoin
> fatal: could not rev-parse split hash
> b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit
> b48030c3eb6e2faf4bff981c5c63ca72aceecdfa
> hint: hash might be a tag, try fetching it from the subtree repository:
> hint: git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111
>
> I didn't try to get farther than this, as it seems that some
> instructions might be missing.

Same as above, added extra instruction to do a fetch first.

Also added a little extra info that the issue may present after the
first split in the instructions depending on the current state of the
repo being used. Also added a way to do the same steps with the changes
applied to see that it resolves the issue.

>> So this commit makes a change to the processing of commits for the
>> split command in order to ignore non-mainline commits from other
>> subtrees such as apollo-ios in the above breakdown by adding a new
>> function 'should_ignore_subtree_commit' which is called during
>> 'process_split_commit'. This allows the split/rejoin processing to
>> still function as expected but removes all of the unnecessary
>> processing that takes place currently which greatly inflates the
>> processing time. In the above example, previously the final split
>> would take ~10-12 minutes, while after this fix it takes seconds.

> Nice!
>
> Except for the above issues in the commit message, the rest of the
> patch looks good to me, thanks!

Great! Thanks for the review and guidance!

^ permalink raw reply

* (no subject)
From: 5598162950 @ 2023-11-30 21:46 UTC (permalink / raw)
  To: 5595721311, 18884493239, git

[-- Attachment #1: text_0.txt --]
[-- Type: text/plain, Size: 66 bytes --]

Someone’s changing my stuff bad mom christal or Victoria or you!

^ permalink raw reply

* Re: "git overlay" - command for overlaying branches
From: Oliver Bandel @ 2023-12-01  0:11 UTC (permalink / raw)
  To: Michal Suchánek; +Cc: git
In-Reply-To: <20231125105046.GB9696@kitsune.suse.cz>

What I basically want to do is:

- Working on more than one branch at the same time.
- Having all checked-out branches together in one working dir.
- When adding/committing, all checked-out files will be added/committed
  to those branches, from where they were checked out.

This way I could edit multiple branches simultaneously independently without the need of merging.
Merging can be done later on, if needed.

Possible alternatives?

For the multiple checkouts I could do a workaround and use a local bare-repo
and check out each branch of interest into a seperate directory.
I could then edit the files from the different dirs.

But for the overlaying thing, to get the files into one dir
(with the automatic hiding of equally named files from branches that
were overlayed before) it would need something like a FUSE-based
tool, so that all the branches could be blended into one working dir.

A "git overlay" would make this easier.
No extra bare-repo needed, no additional tool for blending the dirs
needed.

Ciao,
  Oliver

^ permalink raw reply

* Re: [PATCH] Mention default oldbranch in git-branch doc
From: Rubén Justo @ 2023-12-01  1:25 UTC (permalink / raw)
  To: Clarence "Sparr" Risher via GitGitGadget, git
  Cc: Clarence "Sparr" Risher
In-Reply-To: <pull.1613.git.git.1701303891791.gitgitgadget@gmail.com>

On 30-nov-2023 00:24:51, Clarence "Sparr" Risher via GitGitGadget wrote:
> From: "Clarence \"Sparr\" Risher" <sparr0@gmail.com>
> 
> `git branch -m` has an optional first argument, the branch to rename.
> If omitted, it defaults to the current branch. This behavior is not
> currently described in the documentation.

Good catch.

> While it can be determined
> relatively easily through experimentation, and less so through viewing
> the source code (builtin/branch.c:897)

Some people will find it easier by reading the code than through
experimentation ;-)

> it is not obvious what such a
> command will do when encountered in a less interactive context.

Certainly, I agree.

> 
> This patch adds one sentence to the git-branch documentation, with
> wording based on another optional argument described earlier in the
> same doc.
> 
> The same behavior applies to -M, -c, and -C

Good.

> 
> Signed-off-by: Clarence Risher <clarence@kira-learning.com>
> Signed-off-by: Clarence "Sparr" Risher <sparr0@gmail.com>

Are you both?  If that is the case, IMO it is unusual in this project to
have two different signatures for the same person in one commit.

> 
> diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
> index 4395aa93543..affed1b17a4 100644
> --- a/Documentation/git-branch.txt
> +++ b/Documentation/git-branch.txt
> @@ -73,6 +73,7 @@ overridden by using the `--track` and `--no-track` options, and
>  changed later using `git branch --set-upstream-to`.
>  
>  With a `-m` or `-M` option, <oldbranch> will be renamed to <newbranch>.
> +If the <oldbranch> argument is missing it defaults to the current branch.

Sounds good.  But IMHO this information fits better below, where the
term "<oldbranch>" is described.  Something like:

diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 4395aa9354..9c17df9f4d 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -312,7 +312,8 @@ superproject's "origin/main", but tracks the submodule's "origin/main".
        option is omitted, the current HEAD will be used instead.

 <oldbranch>::
-       The name of an existing branch to rename.
+       The name of an existing branch to rename or copy.  If this
+       option is omitted, the current branch will be used instead.

Completing while we're here, what we missed when -c was introduced in
52d59cc645 (branch: add a --copy (-c) option to go with --move (-m),
2017-06-18).

>  If <oldbranch> had a corresponding reflog, it is renamed to match
>  <newbranch>, and a reflog entry is created to remember the branch
>  renaming. If <newbranch> exists, -M must be used to force the rename
> 
> base-commit: 564d0252ca632e0264ed670534a51d18a689ef5d
> -- 
> gitgitgadget


Thank you for working on this.

^ permalink raw reply related

* Re: [PATCH 1/1] attr: add builtin objectmode values support
From: Joanna Wang @ 2023-12-01  4:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sunshine, tboegi
In-Reply-To: <xmqqil62qfvr.fsf@gitster.g>

> Cumulatively, aside from the removal of the t/#t* file, here is what
> I ended up with so far.

I want to double check if I should followup here.
I assumed that you had already applied these final fixes on my behalf,
similar to my patch for enabling attr for `git-add`. But if I was wrong,
I'm happy to send another update with all the fixes.


On Thu, Nov 16, 2023 at 4:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Other than the removal of that file, I locally applied the following
> > fix-up while checking the difference relative to the previous
> > iteration.
>
> Cumulatively, aside from the removal of the t/#t* file, here is what
> I ended up with so far.
>
> Subject: [PATCH] SQUASH???
>
> ---
>  Documentation/gitattributes.txt |  2 +-
>  neue                            |  0
>  t/t0003-attributes.sh           |  5 +++--
>  t/t6135-pathspec-with-attrs.sh  | 10 ++++++----
>  4 files changed, 10 insertions(+), 7 deletions(-)
>  create mode 100644 neue
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 784aa9d4de..201bdf5edb 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -108,7 +108,7 @@ user defined attributes under this namespace will be ignored and
>  trigger a warning.
>
>  `builtin_objectmode`
> -^^^^^^^^^^^^^^^^^^^^
> +~~~~~~~~~~~~~~~~~~~~
>  This attribute is for filtering files by their file bit modes (40000,
>  120000, 160000, 100755, 100644). e.g. ':(attr:builtin_objectmode=160000)'.
>  You may also check these values with `git check-attr builtin_objectmode -- <file>`.
> diff --git a/neue b/neue
> new file mode 100644
> index 0000000000..e69de29bb2
> diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
> index 86f8681570..774b52c298 100755
> --- a/t/t0003-attributes.sh
> +++ b/t/t0003-attributes.sh
> @@ -580,12 +580,13 @@ test_expect_success 'builtin object mode attributes work (dir and regular paths)
>  '
>
>  test_expect_success POSIXPERM 'builtin object mode attributes work (executable)' '
> -       >exec && chmod +x exec &&
> +       >exec &&
> +       chmod +x exec &&
>         attr_check_object_mode exec 100755
>  '
>
>  test_expect_success SYMLINKS 'builtin object mode attributes work (symlinks)' '
> -       >to_sym ln -s to_sym sym &&
> +       ln -s to_sym sym &&
>         attr_check_object_mode sym 120000
>  '
>
> diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
> index b08a32ea68..f6403ebbda 100755
> --- a/t/t6135-pathspec-with-attrs.sh
> +++ b/t/t6135-pathspec-with-attrs.sh
> @@ -295,22 +295,24 @@ test_expect_success 'reading from .gitattributes in a subdirectory (3)' '
>         test_cmp expect actual
>  '
>
> -test_expect_success 'pathspec with builtin_objectmode attr can be used' '
> +test_expect_success POSIXPERM 'pathspec with builtin_objectmode attr can be used' '
>         >mode_exec_file_1 &&
>
>         git status -s ":(attr:builtin_objectmode=100644)mode_exec_*" >actual &&
>         echo ?? mode_exec_file_1 >expect &&
>         test_cmp expect actual &&
>
> -       git add mode_exec_file_1 && chmod +x mode_exec_file_1 &&
> +       git add mode_exec_file_1 &&
> +       chmod +x mode_exec_file_1 &&
>         git status -s ":(attr:builtin_objectmode=100755)mode_exec_*" >actual &&
>         echo AM mode_exec_file_1 >expect &&
>         test_cmp expect actual
>  '
>
> -test_expect_success 'builtin_objectmode attr can be excluded' '
> +test_expect_success POSIXPERM 'builtin_objectmode attr can be excluded' '
>         >mode_1_regular &&
> -       >mode_1_exec  && chmod +x mode_1_exec &&
> +       >mode_1_exec  &&
> +       chmod +x mode_1_exec &&
>         git status -s ":(exclude,attr:builtin_objectmode=100644)" "mode_1_*" >actual &&
>         echo ?? mode_1_exec >expect &&
>         test_cmp expect actual &&
> --
> 2.43.0-rc2
>

^ permalink raw reply

* Re: [PATCH 3/4] refs: complete list of special refs
From: Patrick Steinhardt @ 2023-12-01  6:43 UTC (permalink / raw)
  To: phillip.wood; +Cc: git, hanwenn
In-Reply-To: <15f67e21-c05f-4a72-9557-2a09a1311f25@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4298 bytes --]

On Thu, Nov 30, 2023 at 03:42:06PM +0000, Phillip Wood wrote:
> Hi Patrick
> 
> Thanks for working on this. I've left a couple of thought below.
> 
> On 29/11/2023 08:14, Patrick Steinhardt wrote:
> > +static int is_special_ref(const char *refname)
> > +{
> > +	/*
> > +	 * Special references get written and read directly via the filesystem
> > +	 * by the subsystems that create them. Thus, they must not go through
> > +	 * the reference backend but must instead be read directly. It is
> > +	 * arguable whether this behaviour is sensible, or whether it's simply
> > +	 * a leaky abstraction enabled by us only having a single reference
> > +	 * backend implementation. But at least for a subset of references it
> > +	 * indeed does make sense to treat them specially:
> > +	 *
> > +	 * - FETCH_HEAD may contain multiple object IDs, and each one of them
> > +	 *   carries additional metadata like where it came from.
> > +	 *
> > +	 * - MERGE_HEAD may contain multiple object IDs when merging multiple
> > +	 *   heads.
> > +	 *
> > +	 * - "rebase-apply/" and "rebase-merge/" contain all of the state for
> > +	 *   rebases, where keeping it closely together feels sensible.
> 
> I'd really like to get away from treating these files as refs. I think their
> use as refs is purely historic and predates the reflog and possibly
> ORIG_HEAD. These days I'm not sure there is a good reason to be running
> 
>     git rev-parse rebase-merge/orig-head
> 
> One reason for not wanting to treat them as refs is that we do not handle
> multi-level refs that do not begin with "refs/" consistently.
> 
>     git update-ref foo/bar HEAD
> 
> succeeds and creates .git/foo/bar but
> 
>     git update-ref -d foo/bar
> 
> fails with
> 
>     error: refusing to update ref with bad name 'foo/bar'
> 
> To me it would make sense to refuse to create 'foo/bar' but allow an
> existing ref named 'foo/bar' to be deleted but the current behavior is the
> opposite of that.
> 
> I'd be quite happy to see us refuse to treat anything that fails
> 
>     if (starts_with(refname, "refs/") || refname_is_safe(refname))
> 
> as a ref but I don't know how much pain that would cause.

Well, we already do use these internally as references, but I don't
disagree with you. I think the current state is extremely confusing,
which is why my first approach was to simply document what falls into
the category of these "special" references.

In my mind, this patch series here is a first step towards addressing
the problem more generally. For now it is more or less only documenting
_what_ is a special ref and why they are special, while also ensuring
that these refs are compatible with the reftable backend. But once this
lands, I'd certainly want to see us continue to iterate on this.

Most importantly, I'd love to see us address two issues:

  - Start to refuse writing these special refs via the refdb so that
    the rules I've now layed out are also enforced. This would also
    address your point about things being inconsistent.

  - Gradually reduce the list of special refs so that they are reduced
    to a bare minimum and so that most refs are simply that, a normal
    ref.

> > +	const char * const special_refs[] = {
> > +		"AUTO_MERGE",
> 
> Is there any reason to treat this specially in the long term? It points to a
> tree rather than a commit but unlike MERGE_HEAD and FETCH_HEAD it is
> effectively a "normal" ref.

No, I'd love to see this and others converted to become a normal ref
eventually. The goal of this patch series was mostly to document what we
already have, and address those cases which are inconsistent with the
new rules. But I'd be happy to convert more of these special refs to
become normal refs after it lands.

> > +		"BISECT_EXPECTED_REV",
> > +		"FETCH_HEAD",
> > +		"MERGE_AUTOSTASH",
> 
> Should we be treating this as a ref? I thought it was written as an
> implementation detail of the autostash implementation rather than to provide
> a ref for users and scripts.

Yes, we have to in the context of the reftable backend. There's a bunch
of tests that exercise our ability to parse this as a ref, and they
would otherwise fail with the reftable backend.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] git-prompt: stop manually parsing HEAD
From: Patrick Steinhardt @ 2023-12-01  7:34 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Eric Sunshine, git, Stan Hu
In-Reply-To: <20231124182803.GA11157@szeder.dev>

[-- Attachment #1: Type: text/plain, Size: 3976 bytes --]

On Fri, Nov 24, 2023 at 07:28:03PM +0100, SZEDER Gábor wrote:
> On Fri, Nov 24, 2023 at 01:09:03PM -0500, Eric Sunshine wrote:
> > On Fri, Nov 24, 2023 at 6:37 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > We're manually parsing the HEAD reference in git-prompt to figure out
> > > whether it is a symbolic or direct reference. This makes it intimately
> > > tied to the on-disk format we use to store references and will stop
> > > working once we gain additional reference backends in the Git project.
> > >
> > > Refactor the code to always use git-symbolic-ref(1) to read HEAD, which
> > > is both simpler and compatible with alternate reference backends.
> > 
> > This may get some push-back from Windows folks due to high
> > process-creation cost on that platform. As I recall, over the years, a
> > good deal of effort has been put into reducing the number of programs
> > run each time the prompt is displayed, precisely because invoking Git
> > (or other programs) multiple times became unbearably slow. In
> > particular, optimizations efforts have focussed on computing as much
> > as possible within the shell itself rather than invoking external
> > programs for the same purpose. Thus, this seems to be taking a step
> > backwards in that regard for the common or status quo case.
> > 
> > Would it be possible instead to, within shell, detect if the historic
> > file-based backend is being used in the current repository, thus
> > continue using the existing shell code for that case, and only employ
> > git-symbolic-ref if some other backend is in use?
> 
> Thanks for sharing my worries :)
> 
> I sent a patch a while ago to Han-Wen to make our Bash prompt script
> work with the reftable backend without incurring the overhead of extra
> subshells or processes when using the files based refs backend.  He
> picked it up and used to include it in rerolls of the reftable patch
> series; the last version of that patch is I believe at:
> 
>   https://public-inbox.org/git/patch-v4-21.28-443bdebfb5d-20210823T120208Z-avarab@gmail.com/

Fair enough, I'm sure I can roll something similar into my patch series.
I do wonder whether it's fine to already submit those patches now where
the reftable backend doesn't exist yet. But I'd hope so, because it
would make it significantly easier for us to upstream the backend if we
can only focus on the backend itself, whereas all the other parts were
already addressed in preliminary refactorings.

One question though is what the right way to detect the reference format
would be. Reading HEAD and comparing it to "ref: refs/heads/.invalid" is
okay for now, but doesn't really feel like a good fit in the long term
as there has been discussion around dropping the requirement for HEAD to
exist altogether [1] in the future. There are some alternatives:

  - Check for the existence of `reftables/` via `test -d`. This is easy
    enough to do, but also doesn't feel all that robust.

  - Extend git-rev-parse(1) to support a new `--show-reference-format`
    option. We already have `--show-object-format`, so this would be a
    natural fit.

In the long term I'd aim for the second solution -- it's the most robust
solution and would also scale if there ever were additional backends.
Furthermore, we already execute git-rev-parse(1) unconditionally anyway.
So there wouldn't be a performance hit here.

While I plan to introduce the `extensions.refStorage` format before
upstreaming the new backend itself, I think it's still going to be some
time until I submit that patch series. Until then, I'd say we simply use
the proposed way of parsing HEAD and second-guessing that it might
indicate the reftable backend, like Stan also does at [2] for our Bash
completion code. I'll make a mental note to refactor these once we have
the extension ready.

Patrick

[1]: <ZWcOvjGPVS_CMUAk@tanuki>
[2]: <20231130202404.89791-1-stanhu@gmail.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 0/2] completion: refactor and support reftables backend
From: Patrick Steinhardt @ 2023-12-01  7:49 UTC (permalink / raw)
  To: Stan Hu; +Cc: git, Christian Couder, Junio C Hamano
In-Reply-To: <20231130202404.89791-1-stanhu@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1031 bytes --]

On Thu, Nov 30, 2023 at 12:24:02PM -0800, Stan Hu wrote:
> Hi,
> 
> This patch series refactors and updates git-completion.bash to become
> compatible with the upcoming reftables backend.

Hi Stan!

Disclaimer up front: I've already reviewed these patches internally at
GitLab.

Regarding the format of this patch submission: in the Git project we
typically use "shallow" threads, where every mail is a reply to the head
of the series (in this case the cover letter). You can configure these
by setting "format.thread=shallow" in your Git configuration.

That's an easy thing to miss though, as MyFirstContribution.txt doesn't
point this out at all. I wonder whether we want to amend it to say so?

Patrick

> Stan Hu (2):
>   completion: refactor existence checks for special refs
>   completion: stop checking for reference existence via `test -f`
> 
>  contrib/completion/git-completion.bash | 43 +++++++++++++++++++++++---
>  1 file changed, 38 insertions(+), 5 deletions(-)
> 
> -- 
> 2.42.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] completion: stop checking for reference existence via `test -f`
From: Patrick Steinhardt @ 2023-12-01  7:49 UTC (permalink / raw)
  To: Stan Hu; +Cc: git, Christian Couder
In-Reply-To: <20231130202404.89791-3-stanhu@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

On Thu, Nov 30, 2023 at 12:24:04PM -0800, Stan Hu wrote:
> In contrib/completion/git-completion.bash, there are a bunch of
> instances where we read special refs like HEAD, MERGE_HEAD,
> REVERT_HEAD, and others via the filesystem. However, the upcoming
> reftable refs backend won't use '.git/HEAD' at all but instead will
> write an invalid refname as placeholder for backwards compatibility,
> which will break the git-completion script.
> 
> Update the '__git_ref_exists' function to:
> 
> 1. Recognize the placeholder '.git/HEAD' written by the reftable
>    backend (its content is specified in the reftable specs).
> 2. If reftable is in use, use 'git rev-parse' to determine whether the
>     given ref exists.
> 3. Otherwise, continue to use 'test -f' to check for the ref's filename.

Nit, not worth a reroll: you already document this in the code, but I
think it could help to also briefly explain why we're going through all
of these hoops here instead of just using git-rev-parse(1) everywhere in
the commit message.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 07/24] midx: implement `--retain-disjoint` mode
From: Patrick Steinhardt @ 2023-12-01  8:02 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWjisIgiuziMvBph@nand.local>

[-- Attachment #1: Type: text/plain, Size: 2367 bytes --]

On Thu, Nov 30, 2023 at 02:29:52PM -0500, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 11:18:51AM +0100, Patrick Steinhardt wrote:
> > > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> > > index d130e65b28..ac0c7b124b 100644
> > > --- a/Documentation/git-multi-pack-index.txt
> > > +++ b/Documentation/git-multi-pack-index.txt
> > > @@ -54,6 +54,14 @@ write::
> > >  		"disjoint". See the "`DISP` chunk and disjoint packs"
> > >  		section in linkgit:gitformat-pack[5] for more.
> > >
> > > +	--retain-disjoint::
> > > +		When writing a multi-pack index with a reachability
> > > +		bitmap, keep any packs marked as disjoint in the
> > > +		existing MIDX (if any) as such in the new MIDX. Existing
> > > +		disjoint packs which are removed (e.g., not listed via
> > > +		`--stdin-packs`) are ignored. This option works in
> > > +		addition to the '+' marker for `--stdin-packs`.
> >
> > I'm trying to understand when you're expected to pass this flag and when
> > you're expected not to pass it. This documentation could also help in
> > the documentation here so that the user can make a more informed
> > decision.
> 
> I think there are multiple reasons that you may or may not want to pass
> that flag. Certainly if you're not using disjoint packs (and instead
> only care about single-pack verbatim reuse over the MIDX's preferred
> packfile), then you don't need to pass it.
> 
> But if you are using disjoint packs, you may want to pass it if you are
> adding packs to the MIDX which are disjoint, _and_ you want to hold onto
> the existing set of disjoint packs.
> 
> But if you want to change the set of disjoint packs entirely, you would
> want to omit this flag (unless you knew a-priori that you were going to
> drop all of the currently marked disjoint packs from the new MIDX you
> are writing, e.g. with --stdin-packs).
> 
> If you think it would be useful, I could try and distill some of this
> down, but I think that there is likely too much detail here for it to be
> useful in user-facing documentation.

Yeah, this indeed feels too detailed to be added here. I was hoping for
a simple "Never do this if"-style rule that points out why it is unwise
under some circumstances, but seems like it's not as simple as that.

Well, so be it. Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 08/24] pack-objects: implement `--ignore-disjoint` mode
From: Patrick Steinhardt @ 2023-12-01  8:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWjjSOJHw6Q1qQ+y@nand.local>

[-- Attachment #1: Type: text/plain, Size: 3638 bytes --]

On Thu, Nov 30, 2023 at 02:32:24PM -0500, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 11:18:57AM +0100, Patrick Steinhardt wrote:
> > > Instead, teach `pack-objects` a special `--ignore-disjoint` which is the
> > > moral equivalent of marking the set of disjoint packs as kept, and
> > > ignoring their contents, even if it would have otherwise been packed. In
> > > fact, this similarity extends down to the implementation, where each
> > > disjoint pack is first loaded, then has its `pack_keep_in_core` bit set.
> > >
> > > With this in place, we can use the kept-pack cache from 20b031fede
> > > (packfile: add kept-pack cache for find_kept_pack_entry(), 2021-02-22),
> > > which looks up objects first in a cache containing just the set of kept
> > > (in this case, disjoint) packs. Assuming that the set of disjoint packs
> > > is a relatively small portion of the entire repository (which should be
> > > a safe assumption to make), each object lookup will be very inexpensive.
> >
> > This cought me by surprise a bit. I'd have expected that in the end,
> > most of the packfiles in a repository would be disjoint. Using for
> > example geometric repacks, my expectation was that all of the packs that
> > get written via geometric repacking would eventually become disjoint
> > whereas new packs added to the repository would initially not be.
> 
> Which part are you referring to here? If you're referring to the part
> where I say that the set of disjoint packs is relatively small in
> proposition to the rest of the packs, I think I know where the confusion
> is.

Yeah, that's what I was referring to.

> I'm not saying that the set of disjoint packs is small in comparison to
> the rest of the repository by object count, but rather by count of packs
> overall. You're right that packs from pushes will not be guaranteed to
> be disjoint upon entering the repository, but will become disjoint when
> geometrically repacked (assuming that the caller uses --ignore-disjoint
> when repacking).

I was actually thinking about it in the number of packfiles, not number
of objects. I'm mostly coming from the angle of geometric repacking
here, where it is totally expected that you have a comparatively large
number of packfiles when your repository is big. With a geometric factor
of 2, you'll have up to `log2($numobjects)` many packfiles in your repo
while keeping the geometric sequence intact.

In something like linux.git with almost 10M objects that boils down to
23 packfiles, and I'd assume that all of these would be disjoint in the
best case. So if you gain new packfiles by pushing into the repository
then I'd think that it's quite likely that the number of non-disjoint
packfiles is smaller than the number of disjoint ones.

I do realize though that in absolute numbers, this isn't all that many.
I was also thinking ahead though to a future where we have something
like geometric repacking with maximum packfile sizes working well
together so that we'll be able to merge packfiles together until they
reach a certain maximum size, and afterwards they are just left alone.
This would help to avoid those "surprise" repack cases where everything
is again packed into a single packfile for the biggest repositories out
there. But it would of course also lead to an increase in packfiles in
those huge repositories.

Anyway, I feel like I'm rambling. In the end it's probably going to be
fine, I was simply surprised by your assumption that the number of
disjoint packfiles should usually be much smaller than the number of
non-disjoint ones.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 00/24] pack-objects: multi-pack verbatim reuse
From: Patrick Steinhardt @ 2023-12-01  8:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZWjk/TYzsrdHefgy@nand.local>

[-- Attachment #1: Type: text/plain, Size: 2866 bytes --]

On Thu, Nov 30, 2023 at 02:39:41PM -0500, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 11:18:19AM +0100, Patrick Steinhardt wrote:
[snip]
> Then you'd have five patch series, where each series does roughly the
> following:
> 
>   1. Preparatory clean-up.
>   2. Implementing the DISP chunk, as well as --retain-disjoint, without
>      a way to generate such packs.
>   3. Implement a way to generate such packs, but without actually being
>      able to reuse more than one of them.
>   4. Implement multi-pack reuse, but without actually reusing any packs.
>   5. Enable multi-pack reuse (and implement the required scaffolding to
>      do so), and test it.
> 
> That's the most sensible split that I could come up with, at least.

Looks sensible to me.

> But
> I still find it relatively unsatisfying for a couple of reasons:
> 
>   - With the exception of the last group of patches, none of the earlier
>     series enable any new, useful behavior on their own. IOW, if we just
>     merged the first three series and then forgot about this topic, we
>     wouldn't have done anything useful ;-).

Well, sometimes I wish we'd buy more into the iterative style of working
in the Git project, where it's fine to land patch series that only work
into the direction of a specific topic but don't yet do anything
interesting by themselves. The benefits would be both that individual
pieces land faster while also ensuring that the review load is kept at
bay.

But there's of course also downsides to this, especially in an open
source project like Git:

  - Contributors may go away in the middle of their endeavour, leaving
    behind dangling pieces that might have complicated some of our
    architecture without actually reaping the intended benefits.

  - Overall, it may take significantly longer to get all pieces into
    Git.

  - Contributors need to do a much better job to explain where they are
    headed when the series by itself doesn't do anything interesting
    yet.

So I understand why we typically don't work this way.

I wonder whether a compromise would be to continue sending complete
patch series, but explicitly point out "break points" in your patch
series. These break points could be an indicator to the maintainer that
it's fine to merge everything up to it while still keeping out the
remainder of the patch series.

>   - The fourth series (which actually implements multi-pack reuse) still
>     remains the most complicated, and would likely be the most difficult
>     to review. So I think you'd still have one difficult series to
>     review, plus four other series which are slightly less difficult to
>     review ;-).

That's fine in my opinion, there's no surprise here that a complicated
topic is demanding for both the author and the reviewer.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* git:// warn as connection not secure
From: Jonny Grant @ 2023-12-01 11:57 UTC (permalink / raw)
  To: git

Hello
May I ask if anyone has suggested adding a default warning that git:// is not a secure connection?

ie "warning: git:// is not a secure connection. https and ssh are secure."

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Cloning into 'linux'...
remote: Enumerating objects: 9863119, done.
remote: Counting objects: 100% (2360/2360), done.
remote: Compressing objects: 100% (1282/1282), done.
^Cceiving objects:   0% (8779/9863119), 3.21 MiB | 6.41 MiB/s

Kind regards
Jonny

^ permalink raw reply

* [PATCH v6] subtree: fix split processing with multiple subtrees present
From: Zach FettersMoore via GitGitGadget @ 2023-12-01 14:54 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Zach FettersMoore, Zach FettersMoore
In-Reply-To: <pull.1587.v5.git.1701206267300.gitgitgadget@gmail.com>

From: Zach FettersMoore <zach.fetters@apollographql.com>

When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

To see this in practice you can use the open source GitHub repo
'apollo-ios-dev' and do the following in order:

-Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
 directories
-Create a commit containing these changes
-Do a split on apollo-ios-codegen
   - Do a fetch on the subtree repo
      - git fetch git@github.com:apollographql/apollo-ios-codegen.git
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
   - Depending on the current state of the 'apollo-ios-dev' repo
     you may see the issue at this point if the last split was on
     apollo-ios
-Do a split on apollo-ios
   - Do a fetch on the subtree repo
      - git fetch git@github.com:apollographql/apollo-ios.git
   - git subtree split --prefix=apollo-ios --squash --rejoin
-Make changes to a file in apollo-ios-codegen
-Create a commit containing the change(s)
-Do a split on apollo-ios-codegen
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
-To see that the patch fixes the issue you can use the custom subtree
 script in the repo so following the same steps as above, except
 instead of using 'git subtree ...' for the commands use
 'git-subtree.sh ...' for the commands

You will see that the final split is looking for the last split
on apollo-ios-codegen to use as it's starting point to process
commits. Since there is a split commit from apollo-ios in between the
2 splits run on apollo-ios-codegen, the processing ends up traversing
the entire history of apollo-ios which increases the time it takes to
do a split based on how long of a history apollo-ios has, while none
of these commits are relevant to the split being done on
apollo-ios-codegen.

So this commit makes a change to the processing of commits for the
split command in order to ignore non-mainline commits from other
subtrees such as apollo-ios in the above breakdown by adding a new
function 'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to
still function as expected but removes all of the unnecessary
processing that takes place currently which greatly inflates the
processing time. In the above example, previously the final split
would take ~10-12 minutes, while after this fix it takes seconds.

Added a test to validate that the proposed fix
solves the issue.

The test accomplishes this by checking the output
of the split command to ensure the output from
the progress of 'process_split_commit' function
that represents the 'extracount' of commits
processed remains at 0, meaning none of the commits
from the second subtree were processed.

This was tested against the original functionality
to show the test failed, and then with this fix
to show the test passes.

This illustrated that when using multiple subtrees,
A and B, when doing a split on subtree B, the
processing does not traverse the entire history
of subtree A which is unnecessary and would cause
the 'extracount' of processed commits to climb
based on the number of commits in the history of
subtree A.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
    subtree: fix split processing with multiple subtrees present
    
    When there are multiple subtrees in a repo and git subtree split
    --rejoin is being used for the subtrees, the processing of commits for a
    new split can take a significant (and constantly growing) amount of time
    because the split commits from other subtrees cause the processing to
    have to scan the entire history of the other subtree(s). This patch
    filters out the other subtree split commits that are unnecessary for the
    split commit processing.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/1587

Range-diff vs v5:

 1:  e7445a95f30 ! 1:  2a65ec0e4df subtree: fix split processing with multiple subtrees present
     @@ Commit message
          To see this in practice you can use the open source GitHub repo
          'apollo-ios-dev' and do the following in order:
      
     -    -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'
     +    -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
           directories
          -Create a commit containing these changes
          -Do a split on apollo-ios-codegen
     +       - Do a fetch on the subtree repo
     +          - git fetch git@github.com:apollographql/apollo-ios-codegen.git
             - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
     +       - Depending on the current state of the 'apollo-ios-dev' repo
     +         you may see the issue at this point if the last split was on
     +         apollo-ios
          -Do a split on apollo-ios
     +       - Do a fetch on the subtree repo
     +          - git fetch git@github.com:apollographql/apollo-ios.git
             - git subtree split --prefix=apollo-ios --squash --rejoin
          -Make changes to a file in apollo-ios-codegen
          -Create a commit containing the change(s)
          -Do a split on apollo-ios-codegen
             - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
     +    -To see that the patch fixes the issue you can use the custom subtree
     +     script in the repo so following the same steps as above, except
     +     instead of using 'git subtree ...' for the commands use
     +     'git-subtree.sh ...' for the commands
      
          You will see that the final split is looking for the last split
          on apollo-ios-codegen to use as it's starting point to process


 contrib/subtree/git-subtree.sh     | 30 +++++++++++++++++++++-
 contrib/subtree/t/t7900-subtree.sh | 40 ++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e0c5d3b0de6..a0bf958ea66 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,6 +778,22 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
+# Usage: check if a commit from another subtree should be
+# ignored from processing for splits
+should_ignore_subtree_split_commit () {
+	assert test $# = 1
+	local rev="$1"
+	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
+	then
+		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
+		then
+			return 0
+		fi
+	fi
+	return 1
+}
+
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
@@ -963,7 +979,19 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		if should_ignore_subtree_split_commit "$rev"
+		then
+			continue
+		fi
+		parsedparents=''
+		for parent in $parents
+		do
+			if ! should_ignore_subtree_split_commit "$parent"
+			then
+				parsedparents="$parsedparents$parent "
+			fi
+		done
+		process_split_commit "$rev" "$parsedparents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new) || exit $?
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index 49a21dd7c9c..ca4df5be832 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -385,6 +385,46 @@ test_expect_success 'split sub dir/ with --rejoin' '
 	)
 '
 
+# Tests that commits from other subtrees are not processed as
+# part of a split.
+#
+# This test performs the following:
+# - Creates Repo with subtrees 'subA' and 'subB'
+# - Creates commits in the repo including changes to subtrees
+# - Runs the following 'split' and commit' commands in order:
+# 	- Perform 'split' on subtree A
+# 	- Perform 'split' on subtree B
+# 	- Create new commits with changes to subtree A and B
+# 	- Perform split on subtree A
+# 	- Check that the commits in subtree B are not processed
+#			as part of the subtree A split
+test_expect_success 'split with multiple subtrees' '
+	subtree_test_create_repo "$test_count" &&
+	subtree_test_create_repo "$test_count/subA" &&
+	subtree_test_create_repo "$test_count/subB" &&
+	test_create_commit "$test_count" main1 &&
+	test_create_commit "$test_count/subA" subA1 &&
+	test_create_commit "$test_count/subA" subA2 &&
+	test_create_commit "$test_count/subA" subA3 &&
+	test_create_commit "$test_count/subB" subB1 &&
+	git -C "$test_count" fetch ./subA HEAD &&
+	git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD &&
+	git -C "$test_count" fetch ./subB HEAD &&
+	git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD &&
+	test_create_commit "$test_count" subADir/main-subA1 &&
+	test_create_commit "$test_count" subBDir/main-subB1 &&
+	git -C "$test_count" subtree split --prefix=subADir \
+		--squash --rejoin -m "Sub A Split 1" &&
+	git -C "$test_count" subtree split --prefix=subBDir \
+		--squash --rejoin -m "Sub B Split 1" &&
+	test_create_commit "$test_count" subADir/main-subA2 &&
+	test_create_commit "$test_count" subBDir/main-subB2 &&
+	git -C "$test_count" subtree split --prefix=subADir \
+		--squash --rejoin -m "Sub A Split 2" &&
+	test "$(git -C "$test_count" subtree split --prefix=subBDir \
+		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
+'
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&

base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
-- 
gitgitgadget

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox