From: "Torsten Bögershausen" <tboegi@web.de>
To: "Junio C Hamano" <gitster@pobox.com>,
"Torsten Bögershausen" <tboegi@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge
Date: Thu, 7 Jul 2016 19:16:09 +0200 [thread overview]
Message-ID: <c36fe487-b8dc-9767-7fae-bee513dac0b2@web.de> (raw)
In-Reply-To: <xmqqa8huvmpv.fsf@gitster.mtv.corp.google.com>
On 2016-07-06 16.57, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
>> At some point inside the merge, Git calls read-cache.c/ce_compare_data(),
>> to check if the path named "file" is clean.
>> According to the tree information, the path "file" has the sha1 99b633.
>> #Note:
>> #sha1 99b633 is "first line\nsame line\n"
>
> I thought your scenario was that our side had CRLF both in the blob
> and in the working tree. So this is a different example? Our side
> has LF in the blob that is checked out with CRLF in the working tree,
> and their side has CRLF in the blob?
>
That was probably to confuse myself, and the rest of the world,
sorry for confusion.
This is the callstack:
merge-recursive.c/add_cacheinfo(unsigned int mode, const unsigned char *sha1,
const char *path, int stage, int refresh, int options)
{
struct cache_entry *ce;
ce = make_cache_entry
if (!ce)
return error(_("addinfo_cache failed for path '%s'"), path);
return add_cache_entry(ce, options);
}
#Calls
read-cache.c/make_cache_entry(path=file sha1=0x99b633 stage=0)
struct cache_entry *make_cache_entry(unsigned int mode,
const unsigned char *sha1, const char *path, int stage,
unsigned int refresh_options)
{
[snip]
ret = refresh_cache_entry(ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
}
#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);
#Calls
ie_modified()
#Calls
read-cache.c/ie_match_stat()
#Calls
changed |= ce_modified_check_fs(ce, st);
#Calls
ce_compare_data(path=file sha1=0x99b633)
#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here, the parameter sha is the output.
Deep down, has_cr_in_index(path) will look at ad55e2 instead,
which is wrong.
>> ce_compare_data() starts the work:
>> OK, lets run index_fd(...,ce->name,...)
>> # index_fd will simulate a "git add" and return the sha1 (via the sha1 pointer)
>> # after the hashing.
>>
>> # ce_compare_data() will later compare ce->sha1 with the result stored in
>> # the local sha1. That's why a sha1 is in the parameter list.
>> # To return the resulting hash:
>>
>> ce_compare_data() calls index_fd(sha1, ..., ce->name, ...)
>>
>> #Down in index_fd():
>>
>> sha1_file.c/index_fd() calls index_core() (after consulting the attributes)
>> index_core() reads the file from the working tree into memory using
>> read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
>> to calculate the sha1.
>>
>> Before the hashing is done, index_mem() runs
>> convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
>> to convert "blobs to git internal format".
>>
>>
>> Here, convert_to_git() consults the .gitattributes (again) to find out that
>> crlf_action is CRLF_AUTO in our case.
>> The "new safer autocrlf handling" says that if a blob as any CR, don't convert
>> CRLFs at "git add".
>>
>> convert.c/crlf_to_git() starts to do it's job:
>> - look at buf (It has CRLF, conversion may be needed)
>> - consult blob_has_cr()
>> # Note: Before this patch, has_cr_in_index(path)) was used
>>
>> #Again, before this patch,
>> has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the
>> blob into memory.
>>
>> Now read_blob_data_from_cache() is a macro, and we end up in
>> read_blob_data_from_index(&the_index, (path), (sz))
>>
>> read-cache.c/read_blob_data_from_index() starts its work:
>> pos = index_name_pos(istate, path, len);
>> if (pos < 0) {
>> /*
>> * We might be in the middle of a merge, in which
>> * case we would read stage #2 (ours).
>> */
>>
>> # And here, and this is important to notice, "ours" is sha1 ad55e2,
>> # which corresponds to "first line\r\nsame line\r\n"
>
> Where did 99b633 come from then? There still is something missing
> in this description.
>
> Puzzled...
This is an unfinished attempt for a commit message:
--------------------------------------------------
correct ce_compare_data() at the end of a merge
The following didn't work as expected:
- At the end of a merge
- merge.renormalize is true,
- .gitattributes = "* text=auto"
- core.eol = crlf
Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob
with LF "first line\nsame line\n".
The expected result of the merge is "first line\nsame line\n".
The content in the working tree is "first line\r\nsame line\r\n",
and ce_compare_data() should find that the content is clean and return 0.
The following callstack does not work:
merge-recursive.c/add_cacheinfo(path=file sha1=0x99b633)
#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);
#Calls
ie_modified()
#Calls
read-cache.c/ie_match_stat()
#Calls
changed |= ce_modified_check_fs(ce, st);
#Calls
ce_compare_data(path=file sha1=0x99b633)
#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here.
#Calls
index_core()
index_core() reads the file from the working tree into memory using
read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
to calculate the sha1.
Before the hashing is done, index_mem() runs
convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
to convert "blobs to git internal format".
Here, convert_to_git() consults the .gitattributes (again) to find out that
crlf_action is CRLF_AUTO in our case.
The "new safer autocrlf handling" says that if a blob as any CR, don't convert
CRLFs at "git add".
convert.c/crlf_to_git() starts to do it's job:
- look at buf (It has CRLF, conversion may be needed)
- consult blob_has_cr()
# Note: Before this patch, has_cr_in_index(path)) was used
#Before this patch,
has_cr_in_index(path)
#Calls
read_blob_data_from_cache(path, &sz) to read the blob into memory.
Now read_blob_data_from_cache() is a macro, and we end up in
read_blob_data_from_index(&the_index, (path), (sz))
read-cache.c/read_blob_data_from_index() starts its work:
pos = index_name_pos(istate, path, len);
if (pos < 0) {
/*
* We might be in the middle of a merge, in which
* case we would read stage #2 (ours).
*/
# And here, and this is important to notice, "ours" is sha1 ad55e2,
# which corresponds to "first line\r\nsame line\r\n"
The result of the check is that the blob 99b633 doesn't seem
to match the working tree, and the whold merge is aborted:
error: addinfo_cache failed for path 'file'
Solution:
Make sure that ce_compare_data() forwards the source sha1 into crlf_to_git().
Replace has_cr_in_index(path) with blob_has_cr(sha1), and forward
the source sha1 from ce_compare_data() into blob_has_cr(sha1).
While at it, rename has_cr_in_index() into blob_has_cr()
and replace 0 with SAFE_CRLF_FALSE.
Add a TC in t6038 to have a test coverage under Linux.
next prev parent reply other threads:[~2016-07-07 17:20 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-27 23:21 What's cooking in git.git (Jun 2016, #09; Mon, 27) Junio C Hamano
2016-06-28 8:01 ` [PATCH v3 0/3] unified auto CRLF handling, V3 tboegi
2016-06-28 8:01 ` [PATCH v3 1/3] convert: unify the "auto" handling of CRLF tboegi
2016-06-28 8:01 ` [PATCH v3 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
2016-06-28 8:01 ` [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge tboegi
2016-06-29 16:14 ` Junio C Hamano
2016-06-30 16:52 ` Torsten Bögershausen
2016-07-01 22:11 ` Junio C Hamano
2016-07-02 18:41 ` Torsten Bögershausen
2016-07-06 14:57 ` Junio C Hamano
2016-07-07 17:16 ` Torsten Bögershausen [this message]
2016-07-07 18:43 ` Junio C Hamano
2016-07-07 22:19 ` Junio C Hamano
2016-07-08 7:52 ` Torsten Bögershausen
2016-07-08 16:36 ` Junio C Hamano
2016-07-08 17:13 ` Torsten Bögershausen
2016-07-08 17:25 ` Junio C Hamano
2016-07-08 17:59 ` Junio C Hamano
2016-07-08 19:01 ` Junio C Hamano
2016-07-08 20:50 ` Junio C Hamano
2016-07-11 20:07 ` Junio C Hamano
2016-07-12 2:23 ` Torsten Bögershausen
2016-07-12 19:54 ` Junio C Hamano
2016-07-08 15:00 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c36fe487-b8dc-9767-7fae-bee513dac0b2@web.de \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.