* Suggetsions for collaboration workflows in large repos
@ 2026-05-29 16:31 Matthew Hughes
2026-05-29 17:56 ` Ben Knoble
2026-05-29 18:06 ` Matthew Hughes
0 siblings, 2 replies; 5+ messages in thread
From: Matthew Hughes @ 2026-05-29 16:31 UTC (permalink / raw)
To: git
Hi,
I'm looking for some git workflow suggestions to help cut down on unnecessary
fetching when working in a large repo with many (hundreds) of other devs and
thousands of branches. Specifically, if in this repo I use the common config to
just fetch all the remote heads:
$ git config set remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'
Then I find I get a lot of noise from the all the branches being
created/updated/deleted as well as an increase in the size of my local repo due
to all the objects I need to fetch across all those branches.
To clarify the general performance of git in this repo is reasonable (shoutout
to `scalar`) but I am interested in cutting down on this fetching since when
working in this repo I'm generally only interested in a tiny subset of all
branches:
1. The `main` branch (that everyone merges into)
2. Any of _my_ branches
3. Occasionally, one of my colleagues branches, so e.g. I can check out their
code locally to review (most reviewing I do in the web UI, this is
GitHub)
I have a prefix for all my branches: `mhughes-`, so to sort out just the
first two points I can configure git to fetch `main` and references with that
prefix:
$ git config set --comment 'fetch main' remote.origin.fetch '+refs/heads/main:refs/remotes/origin/main'
$ git config set --append --comment 'fetch my branches' remote.origin.fetch '+refs/heads/mhughes-*:refs/remotes/origin/mhughes-*'
But then when I do want to check out a colleague's branch I need to explicitly
fetch the exact ref like:
$ git fetch origin some-colleague-branch
$ git checkout FETCH_HEAD -b some-colleague-branch
Which is ok (it's my current workflow), but it means I have to re-fetch the
exact ref if I want to bring in changes that they make after my initial fetch
I could add an explicit fetch of their branch like:
$ git config set --append remote.origin.fetch '+refs/heads/some-colleague-branch:refs/remotes/origin/some-colleague-branch'
So that each `git fetch` also brings in updates to that branch, but in the
remote we delete branches once their changes are merged, so if I leave that
config I'll eventually (once they merge their change and delete the branch) run
into errors when fetching like:
fatal: couldn't find remote ref refs/heads/some-colleague-branch
Does anyone have suggestions to make this smoother? Or alternative workflows
for achieving this goal? I'd also be curious to hear about other approaches
people take went working in large repos with lots of other collaborators.
Or am I just using git wrong in a repo like this, and should adopt another
approach?
I thought about doing something like tracking
`refs/heads*/some-colleague-branch` from the remote, since with the wildcard
`*` I at least won't the fatal error on the missing reference during fetch, but
that risks my config containing an ever growing list of such wildcards, or a
bunch of manual work occasionally cleaning up old ones (or maybe that could be
automated).
Thanks,
Matt
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Suggetsions for collaboration workflows in large repos
2026-05-29 16:31 Suggetsions for collaboration workflows in large repos Matthew Hughes
@ 2026-05-29 17:56 ` Ben Knoble
2026-06-02 18:35 ` Matthew Hughes
2026-05-29 18:06 ` Matthew Hughes
1 sibling, 1 reply; 5+ messages in thread
From: Ben Knoble @ 2026-05-29 17:56 UTC (permalink / raw)
To: Matthew Hughes; +Cc: git
> Le 29 mai 2026 à 12:47, Matthew Hughes <matthewhughes934@gmail.com> a écrit :
>
> Hi,
>
> I'm looking for some git workflow suggestions to help cut down on unnecessary
> fetching when working in a large repo with many (hundreds) of other devs and
> thousands of branches. Specifically, if in this repo I use the common config to
> just fetch all the remote heads:
>
> $ git config set remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'
>
> Then I find I get a lot of noise from the all the branches being
> created/updated/deleted as well as an increase in the size of my local repo due
> to all the objects I need to fetch across all those branches.
>
> To clarify the general performance of git in this repo is reasonable (shoutout
> to `scalar`) but I am interested in cutting down on this fetching since when
> working in this repo I'm generally only interested in a tiny subset of all
> branches:
>
> 1. The `main` branch (that everyone merges into)
> 2. Any of _my_ branches
> 3. Occasionally, one of my colleagues branches, so e.g. I can check out their
> code locally to review (most reviewing I do in the web UI, this is
> GitHub)
>
> I have a prefix for all my branches: `mhughes-`, so to sort out just the
> first two points I can configure git to fetch `main` and references with that
> prefix:
>
> $ git config set --comment 'fetch main' remote.origin.fetch '+refs/heads/main:refs/remotes/origin/main'
> $ git config set --append --comment 'fetch my branches' remote.origin.fetch '+refs/heads/mhughes-*:refs/remotes/origin/mhughes-*'
>
> But then when I do want to check out a colleague's branch I need to explicitly
> fetch the exact ref like:
>
> $ git fetch origin some-colleague-branch
> $ git checkout FETCH_HEAD -b some-colleague-branch
>
> Which is ok (it's my current workflow), but it means I have to re-fetch the
> exact ref if I want to bring in changes that they make after my initial fetch
>
> I could add an explicit fetch of their branch like:
>
> $ git config set --append remote.origin.fetch '+refs/heads/some-colleague-branch:refs/remotes/origin/some-colleague-branch'
>
> So that each `git fetch` also brings in updates to that branch, but in the
> remote we delete branches once their changes are merged, so if I leave that
> config I'll eventually (once they merge their change and delete the branch) run
> into errors when fetching like:
>
> fatal: couldn't find remote ref refs/heads/some-colleague-branch
>
> Does anyone have suggestions to make this smoother? Or alternative workflows
> for achieving this goal? I'd also be curious to hear about other approaches
> people take went working in large repos with lots of other collaborators.
> Or am I just using git wrong in a repo like this, and should adopt another
> approach?
>
> I thought about doing something like tracking
> `refs/heads*/some-colleague-branch` from the remote, since with the wildcard
> `*` I at least won't the fatal error on the missing reference during fetch, but
> that risks my config containing an ever growing list of such wildcards, or a
> bunch of manual work occasionally cleaning up old ones (or maybe that could be
> automated).
>
> Thanks,
> Matt
My current advice is to enable git-maintenance on such a repo, where prefetches and commit graphs and so on will give you a nice perf boost. Then I keep the default fetch all heads config and don’t mind the noise too much.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Suggetsions for collaboration workflows in large repos
2026-05-29 17:56 ` Ben Knoble
@ 2026-06-02 18:35 ` Matthew Hughes
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Hughes @ 2026-06-02 18:35 UTC (permalink / raw)
To: Ben Knoble; +Cc: git
On Fri, May 29, 2026 at 01:56:02PM -0400, Ben Knoble wrote:
> My current advice is to enable git-maintenance on such a repo, where
> prefetches and commit graphs and so on will give you a nice perf boost. Then
> I keep the default fetch all heads config and don’t mind the noise too much.
Thanks, I do have maintenance activated (I believe `scalar` handled that for
me) and that does noticeable speed up some operations, and I have find the
performance in general for almost all operations to be much better than I
expected (having not worked in such a large repo before). My only real issue is
in fetching, since I really don't want to waste the time pulling down all the
other branches in the repo that I almost certainly will never need locally.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Suggetsions for collaboration workflows in large repos
2026-05-29 16:31 Suggetsions for collaboration workflows in large repos Matthew Hughes
2026-05-29 17:56 ` Ben Knoble
@ 2026-05-29 18:06 ` Matthew Hughes
2026-06-03 13:44 ` Toon Claes
1 sibling, 1 reply; 5+ messages in thread
From: Matthew Hughes @ 2026-05-29 18:06 UTC (permalink / raw)
To: git
On Fri, May 29, 2026 at 05:31:17PM +0100, Matthew Hughes wrote:
> I thought about doing something like tracking
> `refs/heads*/some-colleague-branch` from the remote, since with the wildcard
> `*` I at least won't the fatal error on the missing reference during fetch, but
> that risks my config containing an ever growing list of such wildcards, or a
> bunch of manual work occasionally cleaning up old ones (or maybe that could be
> automated).
I hacked some scripts to automate this. Firstly, one for fetching:
1. Fetches the branch
2. Adds a fetch config with wildcard hacks so `git fetch` brings in updates for
that branch (the refspec should match _exactly_ that branch and never
anything more)
3. Adds a separate ref to record that we're tracking this branch (so something
knows to clean it up later)
#!/usr/bin/env bash
set -o errexit -o pipefail -o nounset
# save command as e.g. git-fetch-other
CMD_NAME="$(basename "$0" | sed 's/git-//g')"
if [ $# -lt 1 ]
then
echo "usage: git $CMD_NAME branch-name [ remote-name ]" >&2
exit 1
fi
BRANCH_NAME="$1"
REMOTE_NAME="${2:-origin}"
FETCH_CONFIG_NAME="remote.$REMOTE_NAME.fetch"
git fetch "$REMOTE_NAME" "$BRANCH_NAME"
git checkout -b "$BRANCH_NAME"
# we want to record that we are tracking this branch, to do this create
# a new ref whose name tells us what we're tracking, but whose value is
# unimportant. So as a placeholder value just use the hash of an empty tree
# taken from https://git.kernel.org/pub/scm/git/git.git/commit/?id=9c8a294a1ae1335511475db9c0eb8841c0ec9738
EMPTY_TREE_REF="$(git hash-object -t tree /dev/null)"
# refspec used to track the branch: we expect branches to be deleted from the
# upstream when merged so tracking exactly:
# "+refs/heads/$BRANCH_NAME:refs/remotes/$REMOTE_NAME/$BRANCH_NAME" will error
# when we go to fetch that exact ref after its removed upstream.
# so HACK around this: add wildcards that we still expect to only ever match
# this exact branch (but doesn't have the issue of git complaining when it
# tries to fetch an _exact_ ref)
TRACKING_REFSPEC="+refs/heads*/$BRANCH_NAME:refs/remotes*/$REMOTE_NAME/$BRANCH_NAME"
# record that we're tracking this branch. First check we've not already
# recorded this, then ...
if ! git config get --local --fixed-value --value "$TRACKING_REFSPEC" "$FETCH_CONFIG_NAME" >/dev/null
then
# ... set the config to track it for fetching, and ...
git config set --comment "$CMD_NAME: tracking at $(date -I)" --local --append "$FETCH_CONFIG_NAME" "$TRACKING_REFSPEC"
# ... record that we have special cased this tracking
git update-ref "refs/tracked/$REMOTE_NAME/$BRANCH_NAME" "$EMPTY_TREE_REF"
fi
And the cleanup script (needs to be run periodically):
1. Collects all the remote branches we know about
2. Checks all the references from step 3. above and checks if any branches
defined there are missing remotes (I have fetch.prune=true to keep the remote
tracking references up-to-date)
3. If they are, drops the tracking config for that branch
#!/usr/bin/env bash
set -o errexit -o pipefail -o nounset
REMOTE_NAME="${1:-origin}"
TRACKED_REF_PREFIX="refs/tracked/$REMOTE_NAME"
REMOTE_REF_PREFIX="refs/remotes/$REMOTE_NAME"
declare -A remote_branch_lookup
while read -r remote_ref
do
# strip prefix, e.g. 'refs/remotes/origin/some-branch' -> 'some-branch'
branch_name="${remote_ref#$REMOTE_REF_PREFIX/}"
remote_branch_lookup["$branch_name"]=1
done < <(git for-each-ref --format='%(refname)' "$REMOTE_REF_PREFIX/")
while read -r tracking_info
do
tracked_branch="${tracking_info#$TRACKED_REF_PREFIX/}"
if ! [[ -v "remote_branch_lookup[$tracked_branch]" ]]
then
echo "branch $tracked_branch has been removed from the remote, untracking it"
git update-ref -d "$TRACKED_REF_PREFIX/$tracked_branch"
tracking_refspec="+refs/heads*/$tracked_branch:refs/remotes*/$REMOTE_NAME/$tracked_branch"
git config unset --local --fixed-value --value "$tracking_refspec" "remote.$REMOTE_NAME.fetch"
fi
done < <(git for-each-ref --format='%(refname)' "$TRACKED_REF_PREFIX/")
So functionally I think this allows for the workflow I want, but does feel like
a big ol' hack :>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Suggetsions for collaboration workflows in large repos
2026-05-29 18:06 ` Matthew Hughes
@ 2026-06-03 13:44 ` Toon Claes
0 siblings, 0 replies; 5+ messages in thread
From: Toon Claes @ 2026-06-03 13:44 UTC (permalink / raw)
To: Matthew Hughes, git
Matthew Hughes <matthewhughes934@gmail.com> writes:
> On Fri, May 29, 2026 at 05:31:17PM +0100, Matthew Hughes wrote:
>> I thought about doing something like tracking
>> `refs/heads*/some-colleague-branch` from the remote, since with the wildcard
>> `*` I at least won't the fatal error on the missing reference during fetch, but
>> that risks my config containing an ever growing list of such wildcards, or a
>> bunch of manual work occasionally cleaning up old ones (or maybe that could be
>> automated).
I feel your problem, although a lot less in the project I'm working on
lately. I have these refspecs by the way:
fetch = +refs/heads/master:refs/remotes/origin/master
fetch = +refs/heads/toon-*:refs/remotes/origin/toon-*
> I hacked some scripts to automate this. Firstly, one for fetching:
>
> 1. Fetches the branch
> 2. Adds a fetch config with wildcard hacks so `git fetch` brings in updates for
> that branch (the refspec should match _exactly_ that branch and never
> anything more)
> 3. Adds a separate ref to record that we're tracking this branch (so something
> knows to clean it up later)
>
> #!/usr/bin/env bash
>
> set -o errexit -o pipefail -o nounset
>
> # save command as e.g. git-fetch-other
> CMD_NAME="$(basename "$0" | sed 's/git-//g')"
> if [ $# -lt 1 ]
> then
> echo "usage: git $CMD_NAME branch-name [ remote-name ]" >&2
> exit 1
> fi
>
> BRANCH_NAME="$1"
> REMOTE_NAME="${2:-origin}"
> FETCH_CONFIG_NAME="remote.$REMOTE_NAME.fetch"
>
> git fetch "$REMOTE_NAME" "$BRANCH_NAME"
> git checkout -b "$BRANCH_NAME"
>
> # we want to record that we are tracking this branch, to do this create
> # a new ref whose name tells us what we're tracking, but whose value is
> # unimportant. So as a placeholder value just use the hash of an empty tree
> # taken from https://git.kernel.org/pub/scm/git/git.git/commit/?id=9c8a294a1ae1335511475db9c0eb8841c0ec9738
> EMPTY_TREE_REF="$(git hash-object -t tree /dev/null)"
>
> # refspec used to track the branch: we expect branches to be deleted from the
> # upstream when merged so tracking exactly:
> # "+refs/heads/$BRANCH_NAME:refs/remotes/$REMOTE_NAME/$BRANCH_NAME" will error
> # when we go to fetch that exact ref after its removed upstream.
> # so HACK around this: add wildcards that we still expect to only ever match
> # this exact branch (but doesn't have the issue of git complaining when it
> # tries to fetch an _exact_ ref)
> TRACKING_REFSPEC="+refs/heads*/$BRANCH_NAME:refs/remotes*/$REMOTE_NAME/$BRANCH_NAME"
>
> # record that we're tracking this branch. First check we've not already
> # recorded this, then ...
> if ! git config get --local --fixed-value --value "$TRACKING_REFSPEC" "$FETCH_CONFIG_NAME" >/dev/null
> then
> # ... set the config to track it for fetching, and ...
> git config set --comment "$CMD_NAME: tracking at $(date -I)" --local --append "$FETCH_CONFIG_NAME" "$TRACKING_REFSPEC"
> # ... record that we have special cased this tracking
> git update-ref "refs/tracked/$REMOTE_NAME/$BRANCH_NAME" "$EMPTY_TREE_REF"
> fi
It seems to be a bit more advanced than the alias I have:
cofetch = !sh -c 'git fetch $1 $2:remotes/$1/$2 && git switch -c $2 remotes/$1/$2' -
You need to pass it the remote and the branch name (in reverse order of
yours, which makes sense if you want the remote to be optional).
> And the cleanup script (needs to be run periodically):
>
> 1. Collects all the remote branches we know about
> 2. Checks all the references from step 3. above and checks if any branches
> defined there are missing remotes (I have fetch.prune=true to keep the remote
> tracking references up-to-date)
> 3. If they are, drops the tracking config for that branch
>
> #!/usr/bin/env bash
>
> set -o errexit -o pipefail -o nounset
>
> REMOTE_NAME="${1:-origin}"
> TRACKED_REF_PREFIX="refs/tracked/$REMOTE_NAME"
> REMOTE_REF_PREFIX="refs/remotes/$REMOTE_NAME"
>
> declare -A remote_branch_lookup
> while read -r remote_ref
> do
> # strip prefix, e.g. 'refs/remotes/origin/some-branch' -> 'some-branch'
> branch_name="${remote_ref#$REMOTE_REF_PREFIX/}"
> remote_branch_lookup["$branch_name"]=1
> done < <(git for-each-ref --format='%(refname)' "$REMOTE_REF_PREFIX/")
>
> while read -r tracking_info
> do
> tracked_branch="${tracking_info#$TRACKED_REF_PREFIX/}"
> if ! [[ -v "remote_branch_lookup[$tracked_branch]" ]]
> then
> echo "branch $tracked_branch has been removed from the remote, untracking it"
> git update-ref -d "$TRACKED_REF_PREFIX/$tracked_branch"
>
> tracking_refspec="+refs/heads*/$tracked_branch:refs/remotes*/$REMOTE_NAME/$tracked_branch"
> git config unset --local --fixed-value --value "$tracking_refspec" "remote.$REMOTE_NAME.fetch"
> fi
> done < <(git for-each-ref --format='%(refname)' "$TRACKED_REF_PREFIX/")
>
> So functionally I think this allows for the workflow I want, but does feel like
> a big ol' hack :>
I agree it feels hacky, but I don't really see how we can generalize it
more so it will become a standard feature in git?
I was thinking you can already pass `-c remote.origin.fetch=<refspec>`
(multiple times) to git-clone(1), but in practice it doesn't seem to
work because that config is additive, so it adds the refspec, instead of
overwriting, so you're getting:
fatal: multiple updates for ref 'refs/remotes/origin/main' not allowed
And you cannot combine it with `--single-branch`, although you could do
a single branch clone and then add additional refspecs later.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-03 13:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29 16:31 Suggetsions for collaboration workflows in large repos Matthew Hughes
2026-05-29 17:56 ` Ben Knoble
2026-06-02 18:35 ` Matthew Hughes
2026-05-29 18:06 ` Matthew Hughes
2026-06-03 13:44 ` Toon Claes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox