git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git does not handle changing inode numbers well
@ 2012-08-08 15:22 Matthijs Kooijman
  2012-08-08 17:53 ` Junio C Hamano
  2012-08-08 18:07 ` Matthijs Kooijman
  0 siblings, 2 replies; 4+ messages in thread
From: Matthijs Kooijman @ 2012-08-08 15:22 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3306 bytes --]

(Please CC me, I'm not on the list)

Hi folks,

I've spent some time debugging an issue and I'd like to share the
results. The conclusion of my debugging is that git does not currently
handle changing inode numbers on files well.

I have a custom Fuse filesystem, and fuse dynamically allocates inode
numbers to paths, but keeps a limited cache of inode -> name mappings,
causing the inodes to change over time.

Now of course, you'll probably say, "it's the filesystem's fault, git
can't be expected to cope with that". You'll be right of course, but
since I already spent the time digging into this and figuring out what
goes on inside git in this case, I thought I might as well share the
analysis, just in case someone sees an easy fix in here, or in case
someone else stumbles upon this problem as well.

So, the actual problem I was seeing is that running "git status" showed
all symlinks as "modified", even though they really were identical
between the working copy, index and HEAD. Interestingly enough this only
happened when running "git status" without further arguments, when
running on a subdirectory, it would show no changes as expected.

I compared the output of stat to a hexdump of the index file and found
that everything matched, except for the inode numbers. I originally
thought I was misinterpreting what I saw, but gdb confirmed that it were
indeed the inode numbers that git observed as different.

Now, I could have stopped here and started trying to fix my filesystem
instead. But it was still weird that this problem only existed for
symlinks and that normal files acted as expected. So I dug in a bit
deeper, hoping to find some way to make this work for symlinks as well.

So, here's what happens (IIUC):
 - cmd_status calls refresh_index, which calls refresh_cache_ent for
   every entry in the index.
 - refresh_cache_ent notices that the inode number has changed (for both
   symlinks and regular files) and compares the file / symlink contents.
 - refresh_cache_ent sees the content hasn't changed, so it calls
   fill_stat_cache_info to update the stat info.
 - fill_stat_cache_info sets the EC_UPTODATE flag on the entry, but only
   if it is a regular file.
 - cmd_status calls wt_status_collect which calls
   wt_status_collect_changes_worktree which calls run_diff_files.
 - run_diff_files skips regular files, because of the EC_UPTODATE flag.
   For symlinks, however, it checks the stat info and notices that the
   inode number has changed (again). It does not do a content check at
   this point, but instead just outputs the file as "modified".


It turned out that the reason running "git status" on a subdirectory did
appear to work, was that the number of files in the subdir wasn't big
enough to overflow the inode number cache fuse keeps, so that numbers
didn't change in this case (the problem _did_ occur when trying a bigger
subdirectory).

So, it seems that git just doesn't cope well with changing inode numbers
because it checks the content in a first pass in refresh_index, but only
checks the stat info in the second pass in run_diff_files. The reason it
does work for regular files is EC_UPTODATE optimization introduced in
eadb5831: Avoid running lstat(2) on the same cache entry.

So, let's see if I can fix my filesystem now ;-)

Gr.

Matthijs

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Git does not handle changing inode numbers well
  2012-08-08 15:22 Git does not handle changing inode numbers well Matthijs Kooijman
@ 2012-08-08 17:53 ` Junio C Hamano
  2012-08-08 18:34   ` Matthijs Kooijman
  2012-08-08 18:07 ` Matthijs Kooijman
  1 sibling, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2012-08-08 17:53 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: git

Matthijs Kooijman <matthijs@stdin.nl> writes:

> So, it seems that git just doesn't cope well with changing inode numbers
> because it checks the content in a first pass in refresh_index, but only
> checks the stat info in the second pass in run_diff_files. The reason it
> does work for regular files is EC_UPTODATE optimization introduced in
> eadb5831: Avoid running lstat(2) on the same cache entry.
>
> So, let's see if I can fix my filesystem now ;-)

True.  We have knobs to cope with filesystems whose st_dev or
st_ctime are not stable, but there is no such knob to tweak for
st_ino.  Shouldn't be too hard to add such, though.  One approach is
to do something like the attached patch, and declare, define,
initialize, and set trust_inum in a way similar to how we handle
trust_ctime in the existing code.

 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 2f8159f..6da99af 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -210,7 +210,7 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 	if (ce->ce_uid != (unsigned int) st->st_uid ||
 	    ce->ce_gid != (unsigned int) st->st_gid)
 		changed |= OWNER_CHANGED;
-	if (ce->ce_ino != (unsigned int) st->st_ino)
+	if (trust_inum && ce->ce_ino != (unsigned int) st->st_ino)
 		changed |= INODE_CHANGED;
 
 #ifdef USE_STDEV

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Git does not handle changing inode numbers well
  2012-08-08 15:22 Git does not handle changing inode numbers well Matthijs Kooijman
  2012-08-08 17:53 ` Junio C Hamano
@ 2012-08-08 18:07 ` Matthijs Kooijman
  1 sibling, 0 replies; 4+ messages in thread
From: Matthijs Kooijman @ 2012-08-08 18:07 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

> So, let's see if I can fix my filesystem now ;-)
For anyone interested: turns out passing -o noforget makes fuse keep a
persistent path -> inode mapping (at the cost of memory usage, of
course).

However, it also turns out that fuse wasn't my problem: It was the aufs
mount that was overlayed over my fuse mount (this was on a Debian live
system), which sets the noxino option that prevents aufs from keeping
persistent inode numbers.

To get git status working as expected, I had to both remove noxino from
the aufs mount and add noforget to the underlying fuse mount.

Gr.

Matthijs

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Git does not handle changing inode numbers well
  2012-08-08 17:53 ` Junio C Hamano
@ 2012-08-08 18:34   ` Matthijs Kooijman
  0 siblings, 0 replies; 4+ messages in thread
From: Matthijs Kooijman @ 2012-08-08 18:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

Hi Junio,

> -	if (ce->ce_ino != (unsigned int) st->st_ino)
> +	if (trust_inum && ce->ce_ino != (unsigned int) st->st_ino)
>  		changed |= INODE_CHANGED;

I just tried this with 1.7.10 (that is, I deleted these two lines to
mimic trust_inum being false) and it indeed fixes my problem.

(I'll probably won't be implementing the full patch, though, I've
already figured out how to fix my filesystem instead)

Gr.

Matthijs

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-08-08 18:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-08 15:22 Git does not handle changing inode numbers well Matthijs Kooijman
2012-08-08 17:53 ` Junio C Hamano
2012-08-08 18:34   ` Matthijs Kooijman
2012-08-08 18:07 ` Matthijs Kooijman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).