* Slow git add . performance in large repo @ 2025-03-17 18:53 Yissachar Radcliffe 2025-03-17 23:00 ` brian m. carlson 0 siblings, 1 reply; 4+ messages in thread From: Yissachar Radcliffe @ 2025-03-17 18:53 UTC (permalink / raw) To: git We have a relatively large git repo and have noticed that `git add .` operations are slow (~1.5-2s). We have core.fsmonitor and core.untrackedCache set to true and `git status` executes in ~300ms. When I turn on trace2 I can see that almost all the time is spent in read_directo and it's visiting 26960 directories and 77989 paths. I can use `git add <foo>` or `git add -u .` to speed things up but `git add .` is the most convenient for us. I created a small script to pipe the results of `git status` to `git add` and that runs in <500ms. This leaves me confused as to why the built-in performance is so slow. git version is 2.49.0, mac os x 15.3 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Slow git add . performance in large repo 2025-03-17 18:53 Slow git add . performance in large repo Yissachar Radcliffe @ 2025-03-17 23:00 ` brian m. carlson 2025-03-18 2:04 ` Yissachar Radcliffe 0 siblings, 1 reply; 4+ messages in thread From: brian m. carlson @ 2025-03-17 23:00 UTC (permalink / raw) To: Yissachar Radcliffe; +Cc: git [-- Attachment #1: Type: text/plain, Size: 2521 bytes --] On 2025-03-17 at 18:53:10, Yissachar Radcliffe wrote: > We have a relatively large git repo and have noticed that `git add .` > operations are slow (~1.5-2s). We have core.fsmonitor and > core.untrackedCache set to true and `git status` executes in ~300ms. > When I turn on trace2 I can see that almost all the time is spent in > read_directo and it's visiting 26960 directories and 77989 paths. > > I can use `git add <foo>` or `git add -u .` to speed things up but > `git add .` is the most convenient for us. I created a small script to > pipe the results of `git status` to `git add` and that runs in <500ms. > This leaves me confused as to why the built-in performance is so slow. What you're asking for with those commands is different. `git add -u .` says, "Please enumerate only those files that are in the index, and if they are modified or removed, update the index." `git add .` says, "Please enumerate every file in the working tree recursively and determine if there are any non-ignored changes, and then update the index." (Note that a file that matches an ignore pattern but is already tracked is not ignored, which affects the performance here.) Notably, the former does not add new files that are untracked, but the latter does. That means that the code needs to know if there are any new untracked files. The untracked cache is not used when you specify a pathspec on the command line because in the general case, it doesn't have to be just `.` and it could be something like a match on an attribute or a glob pattern, which would make the code very complex in dealing with that case. It is, however, used when you _don't_ specify a pathspec (such as `git add -u`), as well as for `git status`, since those operate on the whole tree without any pathspecs. When you pipe the results of `git status` to `git add`, you are effectively using the `-u` option, since that will only ever list files that are tracked. I realize `git add .` is very convenient, but it does ask to do substantially more work than `git add -u` (which I use quite frequently), and so it can definitely perform worse, especially in large repositories. You can, of course, continue to use it, but you can't expect them to perform identically. My recommendation would be to use `git add -u` unless you need to add new files, since that's going to perform better. Once you get used to it, it's pretty easy to use. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Slow git add . performance in large repo 2025-03-17 23:00 ` brian m. carlson @ 2025-03-18 2:04 ` Yissachar Radcliffe 2025-03-18 21:06 ` brian m. carlson 0 siblings, 1 reply; 4+ messages in thread From: Yissachar Radcliffe @ 2025-03-18 2:04 UTC (permalink / raw) To: brian m. carlson, Yissachar Radcliffe, git > When you pipe the results of `git status` to `git add`, you are > effectively using the `-u` option, since that will only ever list files > that are tracked. I'm not sure what you mean by this; `git status` lists untracked files. For instance, if I `touch foo.txt` and `git add -u .` then foo.txt will not be staged. But if I pipe the changes from `git status` into `git add` then it will be added. > The untracked cache is not used when you specify > a pathspec on the command line because in the general case, it doesn't > have to be just `.` and it could be something like a match on an > attribute or a glob pattern, which would make the code very complex in > dealing with that case. Is there a reason `git add .` couldn't use the untracked cache even if other pathspecs didn't? I have to imagine that `.` is by far the most common pathspec used and there would be value in speeding that up. > You can, of course, continue to use it, but you > can't expect them to perform identically. I wouldn't expect them to perform identically, but given how much faster it runs when piping in the data from `git status` I think it's reasonable to expect it to run much faster than it does today. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Slow git add . performance in large repo 2025-03-18 2:04 ` Yissachar Radcliffe @ 2025-03-18 21:06 ` brian m. carlson 0 siblings, 0 replies; 4+ messages in thread From: brian m. carlson @ 2025-03-18 21:06 UTC (permalink / raw) To: Yissachar Radcliffe; +Cc: git [-- Attachment #1: Type: text/plain, Size: 2287 bytes --] On 2025-03-18 at 02:04:43, Yissachar Radcliffe wrote: > > When you pipe the results of `git status` to `git add`, you are > > effectively using the `-u` option, since that will only ever list files > > that are tracked. > > I'm not sure what you mean by this; `git status` lists untracked > files. For instance, if I `touch foo.txt` and `git add -u .` then > foo.txt will not be staged. But if I pipe the changes from `git > status` into `git add` then it will be added. Ah, I thought you meant piping the entries without `??`, which are already in the index. Yes, this is faster because it uses the untracked cache in many cases. > Is there a reason `git add .` couldn't use the untracked cache even if > other pathspecs didn't? I have to imagine that `.` is by far the most > common pathspec used and there would be value in speeding that up. I don't see why it's impossible, but nobody has sent a patch. Most Git developers don't use `git add .` because there are better options and typically it isn't recommend to just add everything, so it hasn't been implemented. > I wouldn't expect them to perform identically, but given how much > faster it runs when piping in the data from `git status` I think it's > reasonable to expect it to run much faster than it does today. As I said above, `git add .` isn't something I expect most Git developers use on a daily basis. It's very easy to accidentally add something you didn't intend, such as a build product that was formerly ignored but now is not (because it's no longer generated and someone removed the pattern), so it's not an approach that we typically recommend for that reason. The possibility of files that have accidentally not been ignored properly is not at all uncommon, and I run into it probably a couple times a month between work and home, even though I work with people who usually range from moderately to intimately familiar with Git. If you feel strongly that this should exist, then the code is in `dir.c` (search for `pathspec`), and you could add a special case for this and send a patch. That doesn't guarantee that it will be accepted, but it certainly is more likely if you send a patch. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-03-18 21:06 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-17 18:53 Slow git add . performance in large repo Yissachar Radcliffe 2025-03-17 23:00 ` brian m. carlson 2025-03-18 2:04 ` Yissachar Radcliffe 2025-03-18 21:06 ` brian m. carlson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).