* Committing crimes with NTFS-3G
@ 2024-08-29 20:43 Roman Sandu
2024-08-30 0:47 ` brian m. carlson
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Roman Sandu @ 2024-08-29 20:43 UTC (permalink / raw)
To: git
Good day!
I have a decently sized (80K files) monorepo on an NTFS drive that I
have been working with for a while under Windows via git-for-windows.
Recently, I had to (temporarily) switch to Ubuntu (24.04) via dual-boot
for irrelevant reasons, and I decided that simply mounting my NTFS drive
and using the monorepo from Ubuntu is a great idea, actually, as NTFS-3G
allow for seamless interop with NTFS via UserMapping. And so that is
exactly what I did and It Just Works!
Except it kind of does not. Every time I run `git status` it takes 8
seconds, which is very painful when doing tricky history rewriting.
To diagnose the problem, I ran git status with GIT_TRACE_PERFORMANCE
enabled, and what I see is that the "refresh index" region is taking up
99% of the time. Digging further, `strace -fc git status` tells me that
99% of the time is spent on newfstatat'ing files. Okay, makes sense,
stat'ing files through FUSE is not all that quick. But how many files
are we talking about? My repository has `feature.manyFiles` enabled in
git, so I would expect `core.untrackedCache` make it so that `git
status` skips basically everything except for the root folder which
contains, what, 20 subfolders? But it actually does >96K stat calls!
Which is more than the amount of files in the repository in total.
Briefly looking at the output of `strace -f git status`, I see that git
indeed goes through basically all of the repository, even things that
have not changed for years, as if `core.untrackedCache` is not actually
enabled. Manually enabling it on top of `feature.manyFiles` does not
help. Note that `git update-index --test-untracked-cache` tells me that
mtime does indeed work, and I've also manually stat'ed some folders
which `git status` re-stats on every run and I see that the modify time
is indeed a couple of hours ago, yet even when running `git status`
several times in a row it re-scans the entire folder every time.
So, what do I do about this? It honestly looks like a git bug to me,
maybe it silently fails to update the index with new timestamps because
it was initially created on Windows? But I have no clue how to narrow
this issue down further, so any ideas or suggestions would be appreciated!
Regards,
Roman Sandu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-29 20:43 Committing crimes with NTFS-3G Roman Sandu
@ 2024-08-30 0:47 ` brian m. carlson
2024-08-30 12:52 ` Roman Sandu
2024-08-30 4:18 ` Vito Caputo
2024-08-30 4:58 ` Johannes Sixt
2 siblings, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2024-08-30 0:47 UTC (permalink / raw)
To: Roman Sandu; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]
On 2024-08-29 at 20:43:40, Roman Sandu wrote:
> Good day!
>
> I have a decently sized (80K files) monorepo on an NTFS drive that I have
> been working with for a while under Windows via git-for-windows. Recently, I
> had to (temporarily) switch to Ubuntu (24.04) via dual-boot for irrelevant
> reasons, and I decided that simply mounting my NTFS drive and using the
> monorepo from Ubuntu is a great idea, actually, as NTFS-3G allow for
> seamless interop with NTFS via UserMapping. And so that is exactly what I
> did and It Just Works!
In general, I would not recommend this. NTFS doesn't support Unix
permissions, so I'd expect a lot of things to be broken. A lot of
people like using NTFS to share data across Windows and Linux, but UDF
is a much better choice and I'm not surprised that NTFS isn't working
the way you'd expect.
Also, when you share a repository across systems, you should expect the
index to be fully refreshed each time you change the OS, reading every
file in the repository[0].
> So, what do I do about this? It honestly looks like a git bug to me, maybe
> it silently fails to update the index with new timestamps because it was
> initially created on Windows? But I have no clue how to narrow this issue
> down further, so any ideas or suggestions would be appreciated!
Can you pick some file in your repository and run `stat` on it, before
and after running `git status`, and include the output?
For example:
stat http.c | tee /tmp/before
git status
stat http.c | tee /tmp/after
sha256sum /tmp/before /tmp/after
My guess is that NTFS-3G is not emulating something properly and it's
differing at some point.
[0] https://git-scm.com/docs/gitfaq#sync-working-tree
--
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-29 20:43 Committing crimes with NTFS-3G Roman Sandu
2024-08-30 0:47 ` brian m. carlson
@ 2024-08-30 4:18 ` Vito Caputo
2024-08-30 4:58 ` Johannes Sixt
2 siblings, 0 replies; 14+ messages in thread
From: Vito Caputo @ 2024-08-30 4:18 UTC (permalink / raw)
To: Roman Sandu; +Cc: git
On Thu, Aug 29, 2024 at 11:43:40PM +0300, Roman Sandu wrote:
> Good day!
>
> I have a decently sized (80K files) monorepo on an NTFS drive that I have
> been working with for a while under Windows via git-for-windows. Recently, I
> had to (temporarily) switch to Ubuntu (24.04) via dual-boot for irrelevant
> reasons, and I decided that simply mounting my NTFS drive and using the
> monorepo from Ubuntu is a great idea, actually, as NTFS-3G allow for
> seamless interop with NTFS via UserMapping. And so that is exactly what I
> did and It Just Works!
>
> Except it kind of does not. Every time I run `git status` it takes 8
> seconds, which is very painful when doing tricky history rewriting.
>
> To diagnose the problem, I ran git status with GIT_TRACE_PERFORMANCE
> enabled, and what I see is that the "refresh index" region is taking up 99%
> of the time. Digging further, `strace -fc git status` tells me that 99% of
> the time is spent on newfstatat'ing files. Okay, makes sense, stat'ing files
> through FUSE is not all that quick. But how many files are we talking about?
> My repository has `feature.manyFiles` enabled in git, so I would expect
> `core.untrackedCache` make it so that `git status` skips basically
> everything except for the root folder which contains, what, 20 subfolders?
> But it actually does >96K stat calls! Which is more than the amount of files
> in the repository in total. Briefly looking at the output of `strace -f git
> status`, I see that git indeed goes through basically all of the repository,
> even things that have not changed for years, as if `core.untrackedCache` is
> not actually enabled. Manually enabling it on top of `feature.manyFiles`
> does not help. Note that `git update-index --test-untracked-cache` tells me
> that mtime does indeed work, and I've also manually stat'ed some folders
> which `git status` re-stats on every run and I see that the modify time is
> indeed a couple of hours ago, yet even when running `git status` several
> times in a row it re-scans the entire folder every time.
>
> So, what do I do about this? It honestly looks like a git bug to me, maybe
> it silently fails to update the index with new timestamps because it was
> initially created on Windows? But I have no clue how to narrow this issue
> down further, so any ideas or suggestions would be appreciated!
>
It was pretty big news that Paragon's read-write NTFS driver was merged
into the kernel. You might want to give that a try if your main problem
is performance.
https://lore.kernel.org/lkml/CAHk-=wjn4W-7ZbHrw08cWy=12DgheFUKLO5YLgG6in5TA5HxqQ@mail.gmail.com/
Regards,
Vito Caputo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-29 20:43 Committing crimes with NTFS-3G Roman Sandu
2024-08-30 0:47 ` brian m. carlson
2024-08-30 4:18 ` Vito Caputo
@ 2024-08-30 4:58 ` Johannes Sixt
2024-08-30 12:41 ` Roman Sandu
2024-08-30 16:28 ` Junio C Hamano
2 siblings, 2 replies; 14+ messages in thread
From: Johannes Sixt @ 2024-08-30 4:58 UTC (permalink / raw)
To: Roman Sandu; +Cc: git
Am 29.08.24 um 22:43 schrieb Roman Sandu:
> To diagnose the problem, I ran git status with GIT_TRACE_PERFORMANCE
> enabled, and what I see is that the "refresh index" region is taking up
> 99% of the time.
Of course. The stat information that Git on Linux caches in the index is
vastly different from that that Git for Windows caches. So every time
you switch OS, all files appear modified to Git.
I suggest that you don't switch OS on a whim and take the 8 seconds
delay once when you have to.
-- Hannes
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 4:58 ` Johannes Sixt
@ 2024-08-30 12:41 ` Roman Sandu
2024-08-30 16:28 ` Junio C Hamano
1 sibling, 0 replies; 14+ messages in thread
From: Roman Sandu @ 2024-08-30 12:41 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git
> I suggest that you don't switch OS on a whim and take the 8 seconds
> delay once when you have to.
I am aware that git will refresh the index every time I switch OSes.
That is NOT the problem I am having. The index is being fully refreshed
on *every single* `git status` after I switch to Ubuntu, not just the
first one!
On 8/30/24 07:58, Johannes Sixt wrote:
> Am 29.08.24 um 22:43 schrieb Roman Sandu:
>> To diagnose the problem, I ran git status with GIT_TRACE_PERFORMANCE
>> enabled, and what I see is that the "refresh index" region is taking up
>> 99% of the time.
>
> Of course. The stat information that Git on Linux caches in the index is
> vastly different from that that Git for Windows caches. So every time
> you switch OS, all files appear modified to Git.
>
> I suggest that you don't switch OS on a whim and take the 8 seconds
> delay once when you have to.
>
> -- Hannes
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 0:47 ` brian m. carlson
@ 2024-08-30 12:52 ` Roman Sandu
2024-08-30 15:02 ` brian m. carlson
0 siblings, 1 reply; 14+ messages in thread
From: Roman Sandu @ 2024-08-30 12:52 UTC (permalink / raw)
To: brian m. carlson, git
Yes, I am aware that the index will be fully refreshed on the first run
of status. That is completely acceptable. But that is not what I am
observing, it is being refreshed on every single run of `git status`!
After running stat before and after a status, the sha256 is identical.
Both for files and for folders. Maybe Windows has somehow corrupted the
index with its negative aura which makes git invalidate it on every
single run? Are there tools in git to diagnose the reason for the
index's cache being invalidated?
On the topic of Unix permissions -- as far as I can tell, they ARE in
fact working, because I've created a UserMapping file that tells NTFS-3G
how to map between NTFS/Windows user UUIDs and Linux user IDs. See
NTFS-3G's github page (https://github.com/tuxera/ntfs-3g).
On 8/30/24 03:47, brian m. carlson wrote:
> On 2024-08-29 at 20:43:40, Roman Sandu wrote:
>> Good day!
>>
>> I have a decently sized (80K files) monorepo on an NTFS drive that I have
>> been working with for a while under Windows via git-for-windows. Recently, I
>> had to (temporarily) switch to Ubuntu (24.04) via dual-boot for irrelevant
>> reasons, and I decided that simply mounting my NTFS drive and using the
>> monorepo from Ubuntu is a great idea, actually, as NTFS-3G allow for
>> seamless interop with NTFS via UserMapping. And so that is exactly what I
>> did and It Just Works!
>
> In general, I would not recommend this. NTFS doesn't support Unix
> permissions, so I'd expect a lot of things to be broken. A lot of
> people like using NTFS to share data across Windows and Linux, but UDF
> is a much better choice and I'm not surprised that NTFS isn't working
> the way you'd expect.
>
> Also, when you share a repository across systems, you should expect the
> index to be fully refreshed each time you change the OS, reading every
> file in the repository[0].
>
>> So, what do I do about this? It honestly looks like a git bug to me, maybe
>> it silently fails to update the index with new timestamps because it was
>> initially created on Windows? But I have no clue how to narrow this issue
>> down further, so any ideas or suggestions would be appreciated!
>
> Can you pick some file in your repository and run `stat` on it, before
> and after running `git status`, and include the output?
>
> For example:
>
> stat http.c | tee /tmp/before
> git status
> stat http.c | tee /tmp/after
> sha256sum /tmp/before /tmp/after
>
> My guess is that NTFS-3G is not emulating something properly and it's
> differing at some point.
>
> [0] https://git-scm.com/docs/gitfaq#sync-working-tree
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 12:52 ` Roman Sandu
@ 2024-08-30 15:02 ` brian m. carlson
2024-08-30 19:25 ` Roman Sandu
0 siblings, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2024-08-30 15:02 UTC (permalink / raw)
To: Roman Sandu; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1262 bytes --]
On 2024-08-30 at 12:52:05, Roman Sandu wrote:
> Yes, I am aware that the index will be fully refreshed on the first run of
> status. That is completely acceptable. But that is not what I am observing,
> it is being refreshed on every single run of `git status`!
>
> After running stat before and after a status, the sha256 is identical. Both
> for files and for folders. Maybe Windows has somehow corrupted the index
> with its negative aura which makes git invalidate it on every single run?
> Are there tools in git to diagnose the reason for the index's cache being
> invalidated?
It would still be helpful to see the output of the `stat` command, since
that would tell us useful things about what's causing Git to think the
data has changed. For example, some systems lack certain timestamp
granularity, which can break Git when compiled in certain ways.
You can see if setting `core.trustctime` to false fixes it, and you can
also try `core.checkStat` to `minimal` as well. You should try them in
that order to see if they fix things.
Also, what version of Git are you using? Is it the one in Ubuntu 24.04,
or the one from the git-core PPA, or a different one?
--
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 19:25 ` Roman Sandu
@ 2024-08-30 15:55 ` brian m. carlson
2024-08-30 22:00 ` Roman Sandu
0 siblings, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2024-08-30 15:55 UTC (permalink / raw)
To: Roman Sandu; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2133 bytes --]
On 2024-08-30 at 19:25:56, Roman Sandu wrote:
> The stat output for a random file in the root of the repository is as
> follows:
> ```
> File: <CENSORED>
> Size: 91876 Blocks: 184 IO Block: 4096 regular file
> Device: 259,2 Inode: 4630629 Links: 1
> Access: (0664/-rw-rw-r--) Uid: ( 1000/romasandu) Gid: ( 1000/romasandu)
> Access: 2024-08-29 17:41:04.855126300 +0300
> Modify: 2024-08-29 17:41:04.855609000 +0300
> Change: 2024-08-29 17:41:04.855609000 +0300
> Birth: -
> ```
> Maybe lack of a birth stat is what drives git crazy?
That doesn't exist in POSIX, so it isn't used in Git.
I looked at the Ubuntu git package and it doesn't use `USE_NSEC`, so
your lack of nanosecond precision in timestamps probably isn't the
problem here.
You may want to try using a utility like
https://github.com/shogo82148/git-dump-index to dump the index and find
out what might have changed. You can use `stat -c` to write the data
for the actual files in the same format, and then run `diff` on the two
to find out where they disagree. Or, perhaps you can just eyeball it,
in case there's something obvious (like a `uid` difference).
Or, you could try instrumenting `match_stat_data` or
`stat_validity_check` in `statinfo.c` and printing the data that's
changed.
You might also try disabling untracked cache and see if that fixes the
problem. It might be that there _is_ a bug in that the untracked cache
information isn't correctly refreshed when it was originally written on
a different platform. It's known that Windows writes different
information into the index than Unix systems and perhaps that
information doesn't get refreshed properly.
One other thought: Windows stores symlinks with a different size than
most Unix systems. Windows tends to give them a full block size,
whereas Unix gives them a size of their length in bytes. That
definitely breaks using symlinks in a repository across Windows and WSL.
I don't know if that's what's going on here, but of course it could be
related.
--
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 4:58 ` Johannes Sixt
2024-08-30 12:41 ` Roman Sandu
@ 2024-08-30 16:28 ` Junio C Hamano
2024-09-03 15:58 ` Torsten Bögershausen
1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2024-08-30 16:28 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Roman Sandu, git
Johannes Sixt <j6t@kdbg.org> writes:
> Am 29.08.24 um 22:43 schrieb Roman Sandu:
>> To diagnose the problem, I ran git status with GIT_TRACE_PERFORMANCE
>> enabled, and what I see is that the "refresh index" region is taking up
>> 99% of the time.
>
> Of course. The stat information that Git on Linux caches in the index is
> vastly different from that that Git for Windows caches. So every time
> you switch OS, all files appear modified to Git.
>
> I suggest that you don't switch OS on a whim and take the 8 seconds
> delay once when you have to.
I somehow got an impression that the hit is not just that we need to
adjust cached lstat information in the index file once to the new
filesystem implementation after an OS switch, but every time (as if
we are forced to be extra careful and rehash every time until the
things improve, somewhat like how we handle the racy-Git situation).
Timestamps given by these OSes are not consistent and the clock
appears to have rewound, or something?
Timestamps of files in the working tree ordinarily should match
timestamps in the cached lstat information of these paths in the
index, and timestamp of the index file itself should be newer than
any of the above, or the recy-Git prevention code may tell us to
play safe.
I do not do Windows and/or NTFS, but I have to wonder if the smudge
filters (including the EOL conversion) play a role in this situation
as the working tree is getting switched between LF-native and
CRLF-native systems. May there be situations where the system must
spend time only to realize that there is nothing it needs to do to
canonicalize the file contents and there is no modifications between
the HEAD commit, the index, and the working tree files, or something
silly like that?
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 15:02 ` brian m. carlson
@ 2024-08-30 19:25 ` Roman Sandu
2024-08-30 15:55 ` brian m. carlson
0 siblings, 1 reply; 14+ messages in thread
From: Roman Sandu @ 2024-08-30 19:25 UTC (permalink / raw)
To: brian m. carlson, git
The stat output for a random file in the root of the repository is as
follows:
```
File: <CENSORED>
Size: 91876 Blocks: 184 IO Block: 4096 regular file
Device: 259,2 Inode: 4630629 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/romasandu) Gid: ( 1000/romasandu)
Access: 2024-08-29 17:41:04.855126300 +0300
Modify: 2024-08-29 17:41:04.855609000 +0300
Change: 2024-08-29 17:41:04.855609000 +0300
Birth: -
```
Maybe lack of a birth stat is what drives git crazy?
My git version is 2.43.0, which is the one from the Ubuntu 24.04 repo.
Neither `core.trustctime` nor `core.checkStat minimal` (nor a
combination of them) help.
On 8/30/24 18:02, brian m. carlson wrote:
> On 2024-08-30 at 12:52:05, Roman Sandu wrote:
>> Yes, I am aware that the index will be fully refreshed on the first run of
>> status. That is completely acceptable. But that is not what I am observing,
>> it is being refreshed on every single run of `git status`!
>>
>> After running stat before and after a status, the sha256 is identical. Both
>> for files and for folders. Maybe Windows has somehow corrupted the index
>> with its negative aura which makes git invalidate it on every single run?
>> Are there tools in git to diagnose the reason for the index's cache being
>> invalidated?
>
> It would still be helpful to see the output of the `stat` command, since
> that would tell us useful things about what's causing Git to think the
> data has changed. For example, some systems lack certain timestamp
> granularity, which can break Git when compiled in certain ways.
>
> You can see if setting `core.trustctime` to false fixes it, and you can
> also try `core.checkStat` to `minimal` as well. You should try them in
> that order to see if they fix things.
>
> Also, what version of Git are you using? Is it the one in Ubuntu 24.04,
> or the one from the git-core PPA, or a different one?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 15:55 ` brian m. carlson
@ 2024-08-30 22:00 ` Roman Sandu
0 siblings, 0 replies; 14+ messages in thread
From: Roman Sandu @ 2024-08-30 22:00 UTC (permalink / raw)
To: brian m. carlson, git
So I couldn't figure out how to make stat show me exactly the same
format as the utility you have linked, but everything seems to match up
(up to nsec c/m time which I could not get from stat).
BUT!
The Windows index is definitely corrupted. The utility crashes on it.
After modifying the utility to compare `entry->ce_namelen` against
`strlen(entry->name)`, I found out that they differ for a bunch of
entries, which at some point causes an unfortunate jump which lands
outside of the index.
Hence a question: shouldn't git validate this somehow? I get that the
length of the name is stored separately for speed, but maybe we can have
a special "validate the index" subcommand?
I have fixed the wrong name lengths via the same utility, hoping that it
would help. But sadly, it didn't.
Modifying the script further, I made it stat every single cache entry's
actual file and compare everything. Et voila: mode differs! Git for
windows apparently defaults everything to 644, while NTFS-3G tries to
support actual permissions with UserMapping enabled and so some files
have 664, while others have 777, and more for other files on the drive
but not in the repo.
But alas, backing up the index and changing the mode field to what stat
actually reports didn't help either. It still seems to me like git
should be updating this stuff on it's own if it needs to keep track of
it, but whatever, the issue seems to lie somewhere else.
All in all, definitely seems like a git bug to me. Especially
considering the name length corruption. I guess I'll try to check out
the git sources sometime in the future and play around with them, maybe
I'll find something then. For now, I will use the Linux-native checkout
of my repo and be careful to synchronize the two checkouts via remote
and not forget any unpushed commits. The crime was not perfect after all =(
On 8/30/24 18:55, brian m. carlson wrote:
> On 2024-08-30 at 19:25:56, Roman Sandu wrote:
>> The stat output for a random file in the root of the repository is as
>> follows:
>> ```
>> File: <CENSORED>
>> Size: 91876 Blocks: 184 IO Block: 4096 regular file
>> Device: 259,2 Inode: 4630629 Links: 1
>> Access: (0664/-rw-rw-r--) Uid: ( 1000/romasandu) Gid: ( 1000/romasandu)
>> Access: 2024-08-29 17:41:04.855126300 +0300
>> Modify: 2024-08-29 17:41:04.855609000 +0300
>> Change: 2024-08-29 17:41:04.855609000 +0300
>> Birth: -
>> ```
>> Maybe lack of a birth stat is what drives git crazy?
>
> That doesn't exist in POSIX, so it isn't used in Git.
>
> I looked at the Ubuntu git package and it doesn't use `USE_NSEC`, so
> your lack of nanosecond precision in timestamps probably isn't the
> problem here.
>
> You may want to try using a utility like
> https://github.com/shogo82148/git-dump-index to dump the index and find
> out what might have changed. You can use `stat -c` to write the data
> for the actual files in the same format, and then run `diff` on the two
> to find out where they disagree. Or, perhaps you can just eyeball it,
> in case there's something obvious (like a `uid` difference).
>
> Or, you could try instrumenting `match_stat_data` or
> `stat_validity_check` in `statinfo.c` and printing the data that's
> changed.
>
> You might also try disabling untracked cache and see if that fixes the
> problem. It might be that there _is_ a bug in that the untracked cache
> information isn't correctly refreshed when it was originally written on
> a different platform. It's known that Windows writes different
> information into the index than Unix systems and perhaps that
> information doesn't get refreshed properly.
>
> One other thought: Windows stores symlinks with a different size than
> most Unix systems. Windows tends to give them a full block size,
> whereas Unix gives them a size of their length in bytes. That
> definitely breaks using symlinks in a repository across Windows and WSL.
> I don't know if that's what's going on here, but of course it could be
> related.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-08-30 16:28 ` Junio C Hamano
@ 2024-09-03 15:58 ` Torsten Bögershausen
2024-09-03 17:30 ` Roman Sandu
0 siblings, 1 reply; 14+ messages in thread
From: Torsten Bögershausen @ 2024-09-03 15:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, Roman Sandu, git
On Fri, Aug 30, 2024 at 09:28:17AM -0700, Junio C Hamano wrote:
> Johannes Sixt <j6t@kdbg.org> writes:
>
> > Am 29.08.24 um 22:43 schrieb Roman Sandu:
> >> To diagnose the problem, I ran git status with GIT_TRACE_PERFORMANCE
> >> enabled, and what I see is that the "refresh index" region is taking up
> >> 99% of the time.
> >
> > Of course. The stat information that Git on Linux caches in the index is
> > vastly different from that that Git for Windows caches. So every time
> > you switch OS, all files appear modified to Git.
> >
> > I suggest that you don't switch OS on a whim and take the 8 seconds
> > delay once when you have to.
>
> I somehow got an impression that the hit is not just that we need to
> adjust cached lstat information in the index file once to the new
> filesystem implementation after an OS switch, but every time (as if
> we are forced to be extra careful and rehash every time until the
> things improve, somewhat like how we handle the racy-Git situation).
> Timestamps given by these OSes are not consistent and the clock
> appears to have rewound, or something?
>
> Timestamps of files in the working tree ordinarily should match
> timestamps in the cached lstat information of these paths in the
> index, and timestamp of the index file itself should be newer than
> any of the above, or the recy-Git prevention code may tell us to
> play safe.
>
> I do not do Windows and/or NTFS, but I have to wonder if the smudge
> filters (including the EOL conversion) play a role in this situation
> as the working tree is getting switched between LF-native and
> CRLF-native systems. May there be situations where the system must
> spend time only to realize that there is nothing it needs to do to
> canonicalize the file contents and there is no modifications between
> the HEAD commit, the index, and the working tree files, or something
> silly like that?
Now, that make me wonder:
What are the settings for core.autocrlf ?
Is there a .gitattributes file ?
According to my experience there should be one, when working cross-platform.
A version with the single line
* text=auto
works for many projects, and may be fine-tuned to what you need.
The questin about core.filemode has been raised elsewhere, it should
be false for this repo (but probably is).
Now back to the lstat() question:
There is a
git ls-files --debug
which may give more insight about what is going on.
And back to the line-endings:
does
git ls-files --eol
give any hints, may be ?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-09-03 15:58 ` Torsten Bögershausen
@ 2024-09-03 17:30 ` Roman Sandu
2024-09-05 19:51 ` Torsten Bögershausen
0 siblings, 1 reply; 14+ messages in thread
From: Roman Sandu @ 2024-09-03 17:30 UTC (permalink / raw)
To: Torsten Bögershausen, Junio C Hamano; +Cc: Johannes Sixt, git
.gitattributes is this
https://github.com/GaijinEntertainment/DagorEngine/blob/main/.gitattributes
`core.autocrlf` is unset.
core.filemode is false.
`git ls-files --debug` is extremely useful and I am a tiny bit salty
that no one suggested it before I spent time learning how to parse the
index file in C =)
But alas, nothing new, see my previous messages (hopefully they are
visible to you, I am kind of new to mailing lists).
`git ls-files --eol` says that worktree uses crlf (which is expected, I
checked out the repo under windows), while the index uses lf. Can that
be the reason for Linux git thinking that the index is always out of
date? Various dev tools work perfectly fine with crlf'd files on Linux.
On 9/3/24 18:58, Torsten Bögershausen wrote:
> On Fri, Aug 30, 2024 at 09:28:17AM -0700, Junio C Hamano wrote:
>> Johannes Sixt <j6t@kdbg.org> writes:
>>
>>> Am 29.08.24 um 22:43 schrieb Roman Sandu:
>>>> To diagnose the problem, I ran git status with GIT_TRACE_PERFORMANCE
>>>> enabled, and what I see is that the "refresh index" region is taking up
>>>> 99% of the time.
>>>
>>> Of course. The stat information that Git on Linux caches in the index is
>>> vastly different from that that Git for Windows caches. So every time
>>> you switch OS, all files appear modified to Git.
>>>
>>> I suggest that you don't switch OS on a whim and take the 8 seconds
>>> delay once when you have to.
>>
>> I somehow got an impression that the hit is not just that we need to
>> adjust cached lstat information in the index file once to the new
>> filesystem implementation after an OS switch, but every time (as if
>> we are forced to be extra careful and rehash every time until the
>> things improve, somewhat like how we handle the racy-Git situation).
>> Timestamps given by these OSes are not consistent and the clock
>> appears to have rewound, or something?
>>
>> Timestamps of files in the working tree ordinarily should match
>> timestamps in the cached lstat information of these paths in the
>> index, and timestamp of the index file itself should be newer than
>> any of the above, or the recy-Git prevention code may tell us to
>> play safe.
>>
>> I do not do Windows and/or NTFS, but I have to wonder if the smudge
>> filters (including the EOL conversion) play a role in this situation
>> as the working tree is getting switched between LF-native and
>> CRLF-native systems. May there be situations where the system must
>> spend time only to realize that there is nothing it needs to do to
>> canonicalize the file contents and there is no modifications between
>> the HEAD commit, the index, and the working tree files, or something
>> silly like that?
>
> Now, that make me wonder:
> What are the settings for core.autocrlf ?
> Is there a .gitattributes file ?
> According to my experience there should be one, when working cross-platform.
> A version with the single line
> * text=auto
> works for many projects, and may be fine-tuned to what you need.
>
> The questin about core.filemode has been raised elsewhere, it should
> be false for this repo (but probably is).
>
> Now back to the lstat() question:
> There is a
> git ls-files --debug
>
> which may give more insight about what is going on.
>
> And back to the line-endings:
> does
> git ls-files --eol
> give any hints, may be ?
>
>
>
>
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Committing crimes with NTFS-3G
2024-09-03 17:30 ` Roman Sandu
@ 2024-09-05 19:51 ` Torsten Bögershausen
0 siblings, 0 replies; 14+ messages in thread
From: Torsten Bögershausen @ 2024-09-05 19:51 UTC (permalink / raw)
To: Roman Sandu; +Cc: Junio C Hamano, Johannes Sixt, git
On Tue, Sep 03, 2024 at 08:30:16PM +0300, Roman Sandu wrote:
Side note: Please avoid top-posting here in this list
> .gitattributes is this
> https://github.com/GaijinEntertainment/DagorEngine/blob/main/.gitattributes
> `core.autocrlf` is unset.
That is according to the book, good.
>
> core.filemode is false.
>
> `git ls-files --debug` is extremely useful and I am a tiny bit salty that no
> one suggested it before I spent time learning how to parse the index file in
> C =)
No well, I read this as "thanks for the hint"
> But alas, nothing new, see my previous messages (hopefully they are visible
> to you, I am kind of new to mailing lists).
Yes, everything is here.
>
> `git ls-files --eol` says that worktree uses crlf (which is expected, I
> checked out the repo under windows), while the index uses lf. Can that be
> the reason for Linux git thinking that the index is always out of date?
> Various dev tools work perfectly fine with crlf'd files on Linux.
Yes, that is not the problem for any performance issues.
I did set up a test, kind of.
Exporting a git repo from a Linux NAS to a MacOs laptop, using Samba,
and WLAN.
A simple `git status` takes 4 seconds for 1800 files.
The thing is that Git needs to lstat() each and every file, to find
out if it has been changed.
And if that lstat() call is slow, then `git status` will be slow.
I don't know, if the fs monitor works for your setup ?
https://git-scm.com/docs/git-fsmonitor--daemon
[snip]
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-09-05 19:51 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-29 20:43 Committing crimes with NTFS-3G Roman Sandu
2024-08-30 0:47 ` brian m. carlson
2024-08-30 12:52 ` Roman Sandu
2024-08-30 15:02 ` brian m. carlson
2024-08-30 19:25 ` Roman Sandu
2024-08-30 15:55 ` brian m. carlson
2024-08-30 22:00 ` Roman Sandu
2024-08-30 4:18 ` Vito Caputo
2024-08-30 4:58 ` Johannes Sixt
2024-08-30 12:41 ` Roman Sandu
2024-08-30 16:28 ` Junio C Hamano
2024-09-03 15:58 ` Torsten Bögershausen
2024-09-03 17:30 ` Roman Sandu
2024-09-05 19:51 ` Torsten Bögershausen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).