git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: torvalds@linux-foundation.org
Cc: git@vger.kernel.org
Subject: An interaction with ce_match_stat_basic() and autocrlf
Date: Tue, 08 Jan 2008 04:12:24 -0800	[thread overview]
Message-ID: <7vfxx8tt1z.fsf@gitster.siamese.dyndns.org> (raw)

There is an interesting interaction with the stat matching and
autocrlf.

    $ git init
    $ git config core.autocrlf true
    $ echo a >a.txt
    $ git add a.txt
    $ unix2dos a.txt
    $ git diff
    diff --git a/a.txt b/a.txt

At this point, the index records a blob with LF line ending,
while the work tree file has the same content with CRLF line
ending.  And the funny thing is that once you get into this
situation it is unfixable short of "git add a.txt".  Most
notably, "git update-index --refresh" (and the equilvalent
auto-refresh that is implicitly run by "git diff" Porcelain)
will not update the cached stat information.

This is caused partly by the breakage in size_only codepath of
diff.c::diff_populate_filespec().  When taking the file contents
from the work tree, it just gets stat data and thinks it got the
final size, but it should actually convert the blob data into
canonical format.  diff.c::diffcore_skip_stat_unmatch() is
fooled by this and declares that the path is modified.

This can be fixed by not returning early even when size_only is
asked in the codepath.  It will make everything quite a lot more
expensive, as there currently is not a cheap way to ask "is this
path going to be munged by autocrlf or clean filter", but
getting the correct result is more important than getting a
quick but wrong result.

But that is just a half of the story.

 (1) It won't make the entry stat clean, as refresh_index()
     later called from builtin-diff.c to clean up the stat
     dirtiness works without paying attention to the autocrlf
     conversion.

 (2) It won't help lower-level diff-files and internal callers
     to ce_match_stat() that checks if the path were touched.
     The "read-tree -m -u" codepath uses it to avoid touching
     the path with local modifications.  The standard way to
     clear the stat-dirtiness with "git update-index --refresh"
     still needs to be fixed anyway.

I was going to conclude this message by saying "I need to sleep
on this to see if I can come up with a clean solution", but it
appears I do not have much time left for actually sleeping X-<.

             reply	other threads:[~2008-01-08 12:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-08 12:12 Junio C Hamano [this message]
2008-01-08 16:10 ` An interaction with ce_match_stat_basic() and autocrlf Linus Torvalds
2008-01-08 18:04   ` Junio C Hamano
2008-01-10  2:11   ` Junio C Hamano
2008-01-08 17:12 ` Pēteris Kļaviņš
2008-01-08 17:30   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vfxx8tt1z.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).