* Slow git add . performance in large repo
@ 2025-03-17 18:53 Yissachar Radcliffe
2025-03-17 23:00 ` brian m. carlson
0 siblings, 1 reply; 4+ messages in thread
From: Yissachar Radcliffe @ 2025-03-17 18:53 UTC (permalink / raw)
To: git
We have a relatively large git repo and have noticed that `git add .`
operations are slow (~1.5-2s). We have core.fsmonitor and
core.untrackedCache set to true and `git status` executes in ~300ms.
When I turn on trace2 I can see that almost all the time is spent in
read_directo and it's visiting 26960 directories and 77989 paths.
I can use `git add <foo>` or `git add -u .` to speed things up but
`git add .` is the most convenient for us. I created a small script to
pipe the results of `git status` to `git add` and that runs in <500ms.
This leaves me confused as to why the built-in performance is so slow.
git version is 2.49.0, mac os x 15.3
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Slow git add . performance in large repo
2025-03-17 18:53 Slow git add . performance in large repo Yissachar Radcliffe
@ 2025-03-17 23:00 ` brian m. carlson
2025-03-18 2:04 ` Yissachar Radcliffe
0 siblings, 1 reply; 4+ messages in thread
From: brian m. carlson @ 2025-03-17 23:00 UTC (permalink / raw)
To: Yissachar Radcliffe; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]
On 2025-03-17 at 18:53:10, Yissachar Radcliffe wrote:
> We have a relatively large git repo and have noticed that `git add .`
> operations are slow (~1.5-2s). We have core.fsmonitor and
> core.untrackedCache set to true and `git status` executes in ~300ms.
> When I turn on trace2 I can see that almost all the time is spent in
> read_directo and it's visiting 26960 directories and 77989 paths.
>
> I can use `git add <foo>` or `git add -u .` to speed things up but
> `git add .` is the most convenient for us. I created a small script to
> pipe the results of `git status` to `git add` and that runs in <500ms.
> This leaves me confused as to why the built-in performance is so slow.
What you're asking for with those commands is different. `git add -u .`
says, "Please enumerate only those files that are in the index, and if
they are modified or removed, update the index." `git add .` says,
"Please enumerate every file in the working tree recursively and
determine if there are any non-ignored changes, and then update the
index." (Note that a file that matches an ignore pattern but is already
tracked is not ignored, which affects the performance here.)
Notably, the former does not add new files that are untracked, but the
latter does. That means that the code needs to know if there are any
new untracked files. The untracked cache is not used when you specify
a pathspec on the command line because in the general case, it doesn't
have to be just `.` and it could be something like a match on an
attribute or a glob pattern, which would make the code very complex in
dealing with that case. It is, however, used when you _don't_ specify a
pathspec (such as `git add -u`), as well as for `git status`, since
those operate on the whole tree without any pathspecs.
When you pipe the results of `git status` to `git add`, you are
effectively using the `-u` option, since that will only ever list files
that are tracked.
I realize `git add .` is very convenient, but it does ask to do
substantially more work than `git add -u` (which I use quite
frequently), and so it can definitely perform worse, especially in
large repositories. You can, of course, continue to use it, but you
can't expect them to perform identically. My recommendation would be to
use `git add -u` unless you need to add new files, since that's going to
perform better. Once you get used to it, it's pretty easy to use.
--
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Slow git add . performance in large repo
2025-03-17 23:00 ` brian m. carlson
@ 2025-03-18 2:04 ` Yissachar Radcliffe
2025-03-18 21:06 ` brian m. carlson
0 siblings, 1 reply; 4+ messages in thread
From: Yissachar Radcliffe @ 2025-03-18 2:04 UTC (permalink / raw)
To: brian m. carlson, Yissachar Radcliffe, git
> When you pipe the results of `git status` to `git add`, you are
> effectively using the `-u` option, since that will only ever list files
> that are tracked.
I'm not sure what you mean by this; `git status` lists untracked
files. For instance, if I `touch foo.txt` and `git add -u .` then
foo.txt will not be staged. But if I pipe the changes from `git
status` into `git add` then it will be added.
> The untracked cache is not used when you specify
> a pathspec on the command line because in the general case, it doesn't
> have to be just `.` and it could be something like a match on an
> attribute or a glob pattern, which would make the code very complex in
> dealing with that case.
Is there a reason `git add .` couldn't use the untracked cache even if
other pathspecs didn't? I have to imagine that `.` is by far the most
common pathspec used and there would be value in speeding that up.
> You can, of course, continue to use it, but you
> can't expect them to perform identically.
I wouldn't expect them to perform identically, but given how much
faster it runs when piping in the data from `git status` I think it's
reasonable to expect it to run much faster than it does today.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Slow git add . performance in large repo
2025-03-18 2:04 ` Yissachar Radcliffe
@ 2025-03-18 21:06 ` brian m. carlson
0 siblings, 0 replies; 4+ messages in thread
From: brian m. carlson @ 2025-03-18 21:06 UTC (permalink / raw)
To: Yissachar Radcliffe; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2287 bytes --]
On 2025-03-18 at 02:04:43, Yissachar Radcliffe wrote:
> > When you pipe the results of `git status` to `git add`, you are
> > effectively using the `-u` option, since that will only ever list files
> > that are tracked.
>
> I'm not sure what you mean by this; `git status` lists untracked
> files. For instance, if I `touch foo.txt` and `git add -u .` then
> foo.txt will not be staged. But if I pipe the changes from `git
> status` into `git add` then it will be added.
Ah, I thought you meant piping the entries without `??`, which are
already in the index. Yes, this is faster because it uses the untracked
cache in many cases.
> Is there a reason `git add .` couldn't use the untracked cache even if
> other pathspecs didn't? I have to imagine that `.` is by far the most
> common pathspec used and there would be value in speeding that up.
I don't see why it's impossible, but nobody has sent a patch. Most Git
developers don't use `git add .` because there are better options and
typically it isn't recommend to just add everything, so it hasn't been
implemented.
> I wouldn't expect them to perform identically, but given how much
> faster it runs when piping in the data from `git status` I think it's
> reasonable to expect it to run much faster than it does today.
As I said above, `git add .` isn't something I expect most Git
developers use on a daily basis. It's very easy to accidentally add
something you didn't intend, such as a build product that was formerly
ignored but now is not (because it's no longer generated and someone
removed the pattern), so it's not an approach that we typically
recommend for that reason. The possibility of files that have
accidentally not been ignored properly is not at all uncommon, and I run
into it probably a couple times a month between work and home, even
though I work with people who usually range from moderately to
intimately familiar with Git.
If you feel strongly that this should exist, then the code is in `dir.c`
(search for `pathspec`), and you could add a special case for this and
send a patch. That doesn't guarantee that it will be accepted, but it
certainly is more likely if you send a patch.
--
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-03-18 21:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-17 18:53 Slow git add . performance in large repo Yissachar Radcliffe
2025-03-17 23:00 ` brian m. carlson
2025-03-18 2:04 ` Yissachar Radcliffe
2025-03-18 21:06 ` brian m. carlson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).