* [PATCH 1/2] t/t8006: Demonstrate blame is broken when cachetextconv is on @ 2010-12-18 14:54 Kirill Smelkov 2010-12-18 14:54 ` [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid Kirill Smelkov 0 siblings, 1 reply; 13+ messages in thread From: Kirill Smelkov @ 2010-12-18 14:54 UTC (permalink / raw) To: Junio C Hamano Cc: git, Kirill Smelkov, Axel Bonnet, Clément Poulain, Diane Gasselin, Jeff King I have a git repository with lots of .doc and .pdf files. There diff works ok, but blaming is painfully slow without textconv cache, and with textconv cache, blame says lots of lines are 'Not Yet Committed' which is wrong. Here is a test that demonstrates the problem. Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr> Cc: Clément Poulain <clement.poulain@ensimag.imag.fr> Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr> Cc: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> --- t/t8006-blame-textconv.sh | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/t/t8006-blame-textconv.sh b/t/t8006-blame-textconv.sh index dbf623b..fe90541 100755 --- a/t/t8006-blame-textconv.sh +++ b/t/t8006-blame-textconv.sh @@ -73,6 +73,28 @@ test_expect_success 'blame --textconv going through revisions' ' test_cmp expected result ' +test_expect_success 'setup +cachetextconv' ' + git config diff.test.cachetextconv true +' + +cat >expected_one <<EOF +(Number2 2010-01-01 20:00:00 +0000 1) converted: test 1 version 2 +EOF + +# one.bin is blamed as 'Not Committed yet' +test_expect_failure 'blame --textconv works with textconvcache' ' + git blame --textconv two.bin >blame && + find_blame <blame >result && + test_cmp expected result && + git blame --textconv one.bin >blame && + find_blame <blame >result && + test_cmp expected_one result +' + +test_expect_success 'setup -cachetextconv' ' + git config diff.test.cachetextconv false +' + test_expect_success 'make a new commit' ' echo "bin: test number 2 version 3" >>two.bin && GIT_AUTHOR_NAME=Number3 git commit -a -m Third --date="2010-01-01 22:00:00" -- 1.7.3.4.570.g14308 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-18 14:54 [PATCH 1/2] t/t8006: Demonstrate blame is broken when cachetextconv is on Kirill Smelkov @ 2010-12-18 14:54 ` Kirill Smelkov 2010-12-18 16:13 ` Jeff King 2010-12-20 2:32 ` Junio C Hamano 0 siblings, 2 replies; 13+ messages in thread From: Kirill Smelkov @ 2010-12-18 14:54 UTC (permalink / raw) To: Junio C Hamano Cc: git, Kirill Smelkov, Axel Bonnet, Clément Poulain, Diane Gasselin, Jeff King It turned out, under blame there are requests to fill_textconv() with sha1=0000000000000000000000000000000000000000 and sha1_valid=0. As the code did not analyzed sha1 validity, we ended up putting 000000 into textconv cache which was fooling later blames to discover lots of lines in 'Not Yet Committed' state. Fix it. Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr> Cc: Clément Poulain <clement.poulain@ensimag.imag.fr> Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr> Cc: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> --- diff.c | 4 ++-- t/t8006-blame-textconv.sh | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index 0a43869..5422c43 100644 --- a/diff.c +++ b/diff.c @@ -4412,7 +4412,7 @@ size_t fill_textconv(struct userdiff_driver *driver, return df->size; } - if (driver->textconv_cache) { + if (driver->textconv_cache && df->sha1_valid) { *outbuf = notes_cache_get(driver->textconv_cache, df->sha1, &size); if (*outbuf) @@ -4423,7 +4423,7 @@ size_t fill_textconv(struct userdiff_driver *driver, if (!*outbuf) die("unable to read files to diff"); - if (driver->textconv_cache) { + if (driver->textconv_cache && df->sha1_valid) { /* ignore errors, as we might be in a readonly repository */ notes_cache_put(driver->textconv_cache, df->sha1, *outbuf, size); diff --git a/t/t8006-blame-textconv.sh b/t/t8006-blame-textconv.sh index fe90541..ea64cd8 100755 --- a/t/t8006-blame-textconv.sh +++ b/t/t8006-blame-textconv.sh @@ -81,8 +81,7 @@ cat >expected_one <<EOF (Number2 2010-01-01 20:00:00 +0000 1) converted: test 1 version 2 EOF -# one.bin is blamed as 'Not Committed yet' -test_expect_failure 'blame --textconv works with textconvcache' ' +test_expect_success 'blame --textconv works with textconvcache' ' git blame --textconv two.bin >blame && find_blame <blame >result && test_cmp expected result && -- 1.7.3.4.570.g14308 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-18 14:54 ` [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid Kirill Smelkov @ 2010-12-18 16:13 ` Jeff King 2010-12-18 20:55 ` Kirill Smelkov 2010-12-20 2:26 ` Junio C Hamano 2010-12-20 2:32 ` Junio C Hamano 1 sibling, 2 replies; 13+ messages in thread From: Jeff King @ 2010-12-18 16:13 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, git, Axel Bonnet, Clément Poulain, Diane Gasselin On Sat, Dec 18, 2010 at 05:54:12PM +0300, Kirill Smelkov wrote: > It turned out, under blame there are requests to fill_textconv() with > sha1=0000000000000000000000000000000000000000 and sha1_valid=0. > > As the code did not analyzed sha1 validity, we ended up putting 000000 > into textconv cache which was fooling later blames to discover lots of > lines in 'Not Yet Committed' state. > [...] > - if (driver->textconv_cache) { > + if (driver->textconv_cache && df->sha1_valid) { > *outbuf = notes_cache_get(driver->textconv_cache, df->sha1, > &size); In short: Acked-by: Jeff King <peff@peff.net> But it took some thinking to convince myself, so the long answer is below if anyone cares. I was dubious at first that this could be the right solution. We still end up putting the filespec through run_textconv, which didn't seem right if it is not valid. But reading into it more, there are two levels of invalidity: 1. !DIFF_FILE_VALID(df) - we are not a valid file at all. I.e., we are /dev/null. 2. !df->sha1_valid - we are pointing to a working tree file whose sha1 we don't know I think level (2) never happens at all in the regular diff code, which is why this case was completely unhandled. But it is OK in that case (required, even) to put the contents through run_textconv. In theory we could actually calculate the sha1 in case (2) and cache under that, but I don't know how much it would buy us in practice. It saves us running the textconv filter at the expense of computing the sha1. Which is probably a win for most filters, but on the other hand, it is the wrong place to compute such a sha1. If it is a working tree file, we should ideally update our stat info in the index so that the info can be reused. -Peff PS It is a little disturbing that in fill_textconv, we handle case (1), !DIFF_FILE_VALID for the non-textconv case, but not so for the textconv case. I think we are OK, as get_textconv will never load a textconv driver for a !DIFF_FILE_VALID filespec, so we always follow the non-textconv codepath in that case. But I am tempted to do this just to be more defensive: diff --git a/diff.c b/diff.c index b0ee213..5320849 100644 --- a/diff.c +++ b/diff.c @@ -4404,22 +4404,25 @@ size_t fill_textconv(struct userdiff_driver *driver, if (!driver || !driver->textconv) { if (!DIFF_FILE_VALID(df)) { *outbuf = ""; return 0; } if (diff_populate_filespec(df, 0)) die("unable to read files to diff"); *outbuf = df->data; return df->size; } + if (!DIFF_FILE_VALID(df)) + die("BUG: attempt to textconv an invalid filespec"); + if (driver->textconv_cache) { *outbuf = notes_cache_get(driver->textconv_cache, df->sha1, &size); if (*outbuf) return size; } *outbuf = run_textconv(driver->textconv, df, &size); if (!*outbuf) die("unable to read files to diff"); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-18 16:13 ` Jeff King @ 2010-12-18 20:55 ` Kirill Smelkov 2010-12-19 3:23 ` Junio C Hamano 2010-12-20 2:26 ` Junio C Hamano 1 sibling, 1 reply; 13+ messages in thread From: Kirill Smelkov @ 2010-12-18 20:55 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, git, Axel Bonnet, Cl??ment Poulain, Diane Gasselin On Sat, Dec 18, 2010 at 11:13:37AM -0500, Jeff King wrote: > On Sat, Dec 18, 2010 at 05:54:12PM +0300, Kirill Smelkov wrote: > > > It turned out, under blame there are requests to fill_textconv() with > > sha1=0000000000000000000000000000000000000000 and sha1_valid=0. > > > > As the code did not analyzed sha1 validity, we ended up putting 000000 > > into textconv cache which was fooling later blames to discover lots of > > lines in 'Not Yet Committed' state. > > [...] > > - if (driver->textconv_cache) { > > + if (driver->textconv_cache && df->sha1_valid) { > > *outbuf = notes_cache_get(driver->textconv_cache, df->sha1, > > &size); > > In short: > > Acked-by: Jeff King <peff@peff.net> > > But it took some thinking to convince myself, so the long answer is > below if anyone cares. > > I was dubious at first that this could be the right solution. We still > end up putting the filespec through run_textconv, which didn't seem > right if it is not valid. > > But reading into it more, there are two levels of invalidity: > > 1. !DIFF_FILE_VALID(df) - we are not a valid file at all. I.e., we are > /dev/null. > > 2. !df->sha1_valid - we are pointing to a working tree file whose sha1 > we don't know > > I think level (2) never happens at all in the regular diff code, which > is why this case was completely unhandled. But it is OK in that case > (required, even) to put the contents through run_textconv. > > In theory we could actually calculate the sha1 in case (2) and cache > under that, but I don't know how much it would buy us in practice. It > saves us running the textconv filter at the expense of computing the > sha1. Which is probably a win for most filters, but on the other hand, > it is the wrong place to compute such a sha1. If it is a working tree > file, we should ideally update our stat info in the index so that the > info can be reused. Jeff, Thanks for your ACK and for the explanation. My last patches to git were blame related so semi-intuitively I knew that invalid sha1 are coming from files in worktree. Your description makes things much more clear and I'd put it into patch log as well. What is the best practice for this? For me to re-roll, or for Junio to merge texts? Also experimenting shows that, as you say, regular diff does not have this problem, and also that diff calculates sha1 for files in worktree and so their textconv results are cached. So clearly, there are some behaviour differences between diff and blame. Thanks again, Kirill ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-18 20:55 ` Kirill Smelkov @ 2010-12-19 3:23 ` Junio C Hamano 2010-12-19 12:10 ` Kirill Smelkov 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2010-12-19 3:23 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, git, Axel Bonnet, Clément Poulain, Diane Gasselin Kirill Smelkov <kirr@landau.phys.spbu.ru> writes: > Thanks for your ACK and for the explanation. > > My last patches to git were blame related so semi-intuitively I knew > that invalid sha1 are coming from files in worktree. Your description > makes things much more clear and I'd put it into patch log as well. > What is the best practice for this? For me to re-roll, or for Junio to > merge texts? Re-rolling to explain changes in your own words is preferred; thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-19 3:23 ` Junio C Hamano @ 2010-12-19 12:10 ` Kirill Smelkov 2010-12-20 2:41 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Kirill Smelkov @ 2010-12-19 12:10 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, git, Axel Bonnet, Cl??ment Poulain, Diane Gasselin On Sat, Dec 18, 2010 at 07:23:29PM -0800, Junio C Hamano wrote: > Kirill Smelkov <kirr@landau.phys.spbu.ru> writes: > > > Thanks for your ACK and for the explanation. > > > > My last patches to git were blame related so semi-intuitively I knew > > that invalid sha1 are coming from files in worktree. Your description > > makes things much more clear and I'd put it into patch log as well. > > What is the best practice for this? For me to re-roll, or for Junio to > > merge texts? > > Re-rolling to explain changes in your own words is preferred; thanks. I see, thanks. I'm not that familiar with git internals involved, so here is updated patch with added paragraph about "df->sha1_valid=0 means files from worktree with unknown sha1", and appropriate excerpt from Jeff's post. That's the most reasonable I could come up with. Thanks, Kirill P.S. please don't forget to pick patch 1 which is unchanged. ---- 8< ---- From: Kirill Smelkov <kirr@landau.phys.spbu.ru> Date: Sat, 18 Dec 2010 16:27:28 +0300 Subject: [PATCH v2 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It turned out, under blame there are requests to fill_textconv() with sha1=0000000000000000000000000000000000000000 and sha1_valid=0. As the code did not analyzed sha1 validity, we ended up putting 000000 into textconv cache which was fooling later blames to discover lots of lines in 'Not Yet Committed' state. In practice df->sha1_valid=0 means blame requests to run textconv on a file in worktree whose sha1 is not know yet. Fix it. On Sat, Dec 18, 2010 at 11:13:37AM -0500, Jeff King wrote: > > In short: > > Acked-by: Jeff King <peff@peff.net> > > But it took some thinking to convince myself, so the long answer is > below if anyone cares. > > I was dubious at first that this could be the right solution. We still > end up putting the filespec through run_textconv, which didn't seem > right if it is not valid. > > But reading into it more, there are two levels of invalidity: > > 1. !DIFF_FILE_VALID(df) - we are not a valid file at all. I.e., we are > /dev/null. > > 2. !df->sha1_valid - we are pointing to a working tree file whose sha1 > we don't know > > I think level (2) never happens at all in the regular diff code, which > is why this case was completely unhandled. But it is OK in that case > (required, even) to put the contents through run_textconv. Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr> Cc: Clц╘ment Poulain <clement.poulain@ensimag.imag.fr> Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr> Cc: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> Acked-by: Jeff King <peff@peff.net> --- diff.c | 4 ++-- t/t8006-blame-textconv.sh | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index 0a43869..5422c43 100644 --- a/diff.c +++ b/diff.c @@ -4412,7 +4412,7 @@ size_t fill_textconv(struct userdiff_driver *driver, return df->size; } - if (driver->textconv_cache) { + if (driver->textconv_cache && df->sha1_valid) { *outbuf = notes_cache_get(driver->textconv_cache, df->sha1, &size); if (*outbuf) @@ -4423,7 +4423,7 @@ size_t fill_textconv(struct userdiff_driver *driver, if (!*outbuf) die("unable to read files to diff"); - if (driver->textconv_cache) { + if (driver->textconv_cache && df->sha1_valid) { /* ignore errors, as we might be in a readonly repository */ notes_cache_put(driver->textconv_cache, df->sha1, *outbuf, size); diff --git a/t/t8006-blame-textconv.sh b/t/t8006-blame-textconv.sh index fe90541..ea64cd8 100755 --- a/t/t8006-blame-textconv.sh +++ b/t/t8006-blame-textconv.sh @@ -81,8 +81,7 @@ cat >expected_one <<EOF (Number2 2010-01-01 20:00:00 +0000 1) converted: test 1 version 2 EOF -# one.bin is blamed as 'Not Committed yet' -test_expect_failure 'blame --textconv works with textconvcache' ' +test_expect_success 'blame --textconv works with textconvcache' ' git blame --textconv two.bin >blame && find_blame <blame >result && test_cmp expected result && -- 1.7.3.4.570.g14308 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-19 12:10 ` Kirill Smelkov @ 2010-12-20 2:41 ` Junio C Hamano 2010-12-20 4:46 ` Jeff King 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2010-12-20 2:41 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Jeff King, git, Axel Bonnet, Cl??ment Poulain, Diane Gasselin Kirill Smelkov <kirr@landau.phys.spbu.ru> writes: > On Sat, Dec 18, 2010 at 07:23:29PM -0800, Junio C Hamano wrote: >> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes: >> >> > Thanks for your ACK and for the explanation. >> > >> > My last patches to git were blame related so semi-intuitively I knew >> > that invalid sha1 are coming from files in worktree. Your description >> > makes things much more clear and I'd put it into patch log as well. >> > What is the best practice for this? For me to re-roll, or for Junio to >> > merge texts? >> >> Re-rolling to explain changes in your own words is preferred; thanks. > > I see, thanks. > > I'm not that familiar with git internals involved, so here is updated > patch with added paragraph about "df->sha1_valid=0 means files from > worktree with unknown sha1", and appropriate excerpt from Jeff's post. > That's the most reasonable I could come up with. > > Thanks, > Kirill > > P.S. please don't forget to pick patch 1 which is unchanged. Here is how I would describe it. commit 87bb04bb760659dd33d7a173333329cd900620a9 Author: Kirill Smelkov <kirr@landau.phys.spbu.ru> Date: Sat Dec 18 17:54:12 2010 +0300 fill_textconv(): Don't get/put cache if sha1 is not valid When blaming files in the working tree, the filespec is marked with !sha1_valid, as we have not given the contents an object name yet. The function to cache textconv results (keyed on the object name), however, didn't check this condition, and ended up on storing the cached result under a random object name. Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-20 2:41 ` Junio C Hamano @ 2010-12-20 4:46 ` Jeff King 2010-12-20 19:28 ` Kirill Smelkov 0 siblings, 1 reply; 13+ messages in thread From: Jeff King @ 2010-12-20 4:46 UTC (permalink / raw) To: Junio C Hamano Cc: Kirill Smelkov, git, Axel Bonnet, Cl??ment Poulain, Diane Gasselin On Sun, Dec 19, 2010 at 06:41:22PM -0800, Junio C Hamano wrote: > > I'm not that familiar with git internals involved, so here is updated > > patch with added paragraph about "df->sha1_valid=0 means files from > > worktree with unknown sha1", and appropriate excerpt from Jeff's post. > > That's the most reasonable I could come up with. > [...] > Here is how I would describe it. > > commit 87bb04bb760659dd33d7a173333329cd900620a9 > Author: Kirill Smelkov <kirr@landau.phys.spbu.ru> > Date: Sat Dec 18 17:54:12 2010 +0300 > > fill_textconv(): Don't get/put cache if sha1 is not valid > > When blaming files in the working tree, the filespec is marked with > !sha1_valid, as we have not given the contents an object name yet. The > function to cache textconv results (keyed on the object name), however, > didn't check this condition, and ended up on storing the cached result > under a random object name. > > Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> FWIW, I think that is a good description. -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-20 4:46 ` Jeff King @ 2010-12-20 19:28 ` Kirill Smelkov 0 siblings, 0 replies; 13+ messages in thread From: Kirill Smelkov @ 2010-12-20 19:28 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, git, Axel Bonnet, Cl??ment Poulain, Diane Gasselin On Sun, Dec 19, 2010 at 11:46:56PM -0500, Jeff King wrote: > On Sun, Dec 19, 2010 at 06:41:22PM -0800, Junio C Hamano wrote: > > > > I'm not that familiar with git internals involved, so here is updated > > > patch with added paragraph about "df->sha1_valid=0 means files from > > > worktree with unknown sha1", and appropriate excerpt from Jeff's post. > > > That's the most reasonable I could come up with. > > [...] > > Here is how I would describe it. > > > > commit 87bb04bb760659dd33d7a173333329cd900620a9 > > Author: Kirill Smelkov <kirr@landau.phys.spbu.ru> > > Date: Sat Dec 18 17:54:12 2010 +0300 > > > > fill_textconv(): Don't get/put cache if sha1 is not valid > > > > When blaming files in the working tree, the filespec is marked with > > !sha1_valid, as we have not given the contents an object name yet. The > > function to cache textconv results (keyed on the object name), however, > > didn't check this condition, and ended up on storing the cached result > > under a random object name. > > > > Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> > > FWIW, I think that is a good description. Junio, Jeff, thanks for re-wording it. Though I think my v2 text was saying the same, only with more info + examples. My english is pretty bad this days, so I kind of understand why it was tempting to be redone :) Thanks anyway, and for picking this into next, Kirill P.S. somehow 'Acked-by: Jeff King <peff@peff.net>' was dropped. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-18 16:13 ` Jeff King 2010-12-18 20:55 ` Kirill Smelkov @ 2010-12-20 2:26 ` Junio C Hamano 2010-12-20 4:42 ` Jeff King 1 sibling, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2010-12-20 2:26 UTC (permalink / raw) To: Jeff King Cc: Kirill Smelkov, git, Axel Bonnet, Clément Poulain, Diane Gasselin Jeff King <peff@peff.net> writes: > PS It is a little disturbing that in fill_textconv, we handle > case (1), !DIFF_FILE_VALID for the non-textconv case, but not so for the > textconv case. I think we are OK, as get_textconv will never load a > textconv driver for a !DIFF_FILE_VALID filespec, so we always follow the > non-textconv codepath in that case. But I am tempted to do this just to > be more defensive: FILE_VALID() is about "does that side have a blob there, or is this create/delete diff?", so the caller should be handling this properly as you said, but your fill_textconv() already prepares for the case where the caller for some reason calls this function with "no blob on this side" and returns an empty string (see the precontext of your patch). I think it is fine to be defensive to prepare for such a case, but then dying like this patch does is inconsistent. Perhaps we should move the new check higher and remove the *outbuf = "" while at it? > diff --git a/diff.c b/diff.c > index b0ee213..5320849 100644 > --- a/diff.c > +++ b/diff.c > @@ -4404,22 +4404,25 @@ size_t fill_textconv(struct userdiff_driver *driver, > if (!driver || !driver->textconv) { > if (!DIFF_FILE_VALID(df)) { > *outbuf = ""; > return 0; > } > if (diff_populate_filespec(df, 0)) > die("unable to read files to diff"); > *outbuf = df->data; > return df->size; > } > > + if (!DIFF_FILE_VALID(df)) > + die("BUG: attempt to textconv an invalid filespec"); > + ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-20 2:26 ` Junio C Hamano @ 2010-12-20 4:42 ` Jeff King 2010-12-20 8:42 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Jeff King @ 2010-12-20 4:42 UTC (permalink / raw) To: Junio C Hamano Cc: Kirill Smelkov, git, Axel Bonnet, Clément Poulain, Diane Gasselin On Sun, Dec 19, 2010 at 06:26:55PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > PS It is a little disturbing that in fill_textconv, we handle > > case (1), !DIFF_FILE_VALID for the non-textconv case, but not so for the > > textconv case. I think we are OK, as get_textconv will never load a > > textconv driver for a !DIFF_FILE_VALID filespec, so we always follow the > > non-textconv codepath in that case. But I am tempted to do this just to > > be more defensive: > > FILE_VALID() is about "does that side have a blob there, or is this > create/delete diff?", so the caller should be handling this properly as > you said, but your fill_textconv() already prepares for the case where the > caller for some reason calls this function with "no blob on this side" and > returns an empty string (see the precontext of your patch). > > I think it is fine to be defensive to prepare for such a case, but then > dying like this patch does is inconsistent. Perhaps we should move the > new check higher and remove the *outbuf = "" while at it? I'm not sure returning the empty string for a textconv is the right solution. I am inclined to say that trying to textconv a !DIFF_FILE_VALID is simply an error. More on that in a second. If we were to do anything else, I would think it would be to feed "" to the textconv filter, in case it wanted to do something magical for the create/delete case. For example, imagine a textconv filter which turned a string of bytes like "foo" into a length plus set of converted bytes, like "3: FOO". You would want the /dev/null case to turn into "0: ". Now, this is obviously a ridiculous toy case. I have no idea if anyone would want to do something like that. So far most people have been happy with /dev/null never being textconv'd, and always looking like the empty string. Moreover, even if somebody did want this behavior, 99% of the other filters _wouldn't_ want the behavior. Because programs like odt2txt or exiftool that people _do_ use for textconv filters do not want to be fed /dev/null; they will signal an error. So that is my "if we wanted it to do something useful, this is what it would do" case, and I don't see any real-world evidence that anybody wants that. Now on to being defensive. What we are defending against is a caller marking something as to-be-textconv'd, even though it is !DIFF_FILE_VALID. But what did the caller want? One sensible behavior is what I described above. Or maybe they did just want the empty string. Or more likely, it is simply a bug in the diff code. Since we haven't defined the semantics, I would much rather loudly scream "BUG!" than paper over it by returning what we guess they would have wanted (which may generate subtly wrong results). And then we can think about that use case and decide what the semantics, if any, should be. So I stand by my thought that it should die(). But I don't think there _are_ any such bugs currently, so it probably doesn't matter much either way. I can live with "return 0", or even just leaving it alone. -Peff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-20 4:42 ` Jeff King @ 2010-12-20 8:42 ` Junio C Hamano 0 siblings, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2010-12-20 8:42 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Kirill Smelkov, git, Axel Bonnet, Clément Poulain, Diane Gasselin Jeff King <peff@peff.net> writes: > On Sun, Dec 19, 2010 at 06:26:55PM -0800, Junio C Hamano wrote: > ... >> FILE_VALID() is about "does that side have a blob there, or is this >> create/delete diff?", so the caller should be handling this properly as >> you said, but your fill_textconv() already prepares for the case where the >> caller for some reason calls this function with "no blob on this side" and >> returns an empty string (see the precontext of your patch). >> >> I think it is fine to be defensive to prepare for such a case, but then >> dying like this patch does is inconsistent. Perhaps we should move the >> new check higher and remove the *outbuf = "" while at it? > > I'm not sure returning the empty string for a textconv is the right > solution.... > > So I stand by my thought that it should die(). But I don't think there > _are_ any such bugs currently, so it probably doesn't matter much either > way. I can live with "return 0", or even just leaving it alone. I must have phrased it badly. I am actually Ok either way (i.e. make this function prepare for a future when we start passing the missing side to the function, and have a special case for "if (!DIFF_FILE_VALID)" and returning something like an empty string, or make this function refuse to be fed the missing side by dying in "if (!DIFF_FILE_VALID)". I was only pointing out that the result of applying your patch does one in one case and another in the other case, which is inconsistent. And we do not know in advance what is the reasonable fallback value for the missing side, so we do not now "something like an empty string" is a reasonable thing to do yet. Hence "move the new check higher and remove ..." was my suggestion, which would look like the attached, which would be consistent with your message I am replying to. IOW, I think we are on the same page. diff.c | 15 +++++++++++---- 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index 5422c43..a04ab2f 100644 --- a/diff.c +++ b/diff.c @@ -4401,11 +4401,18 @@ size_t fill_textconv(struct userdiff_driver *driver, { size_t size; + /* + * !DIFF_FILE_VALID(df) means this is a missing side of the + * diff (preimage of creation, or postimage of deletion diff). + * The caller should not try to textconv such a filespec, as + * there is no such blob to begin with! + */ + + if (!DIFF_FILE_VALID(df)) + die("Feeding missing side to fill_textconv?: '%s'", + df->path); + if (!driver || !driver->textconv) { - if (!DIFF_FILE_VALID(df)) { - *outbuf = ""; - return 0; - } if (diff_populate_filespec(df, 0)) die("unable to read files to diff"); *outbuf = df->data; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid 2010-12-18 14:54 ` [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid Kirill Smelkov 2010-12-18 16:13 ` Jeff King @ 2010-12-20 2:32 ` Junio C Hamano 1 sibling, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2010-12-20 2:32 UTC (permalink / raw) To: Kirill Smelkov Cc: git, Axel Bonnet, Clément Poulain, Diane Gasselin, Jeff King Kirill Smelkov <kirr@landau.phys.spbu.ru> writes: > It turned out, under blame there are requests to fill_textconv() with > sha1=0000000000000000000000000000000000000000 and sha1_valid=0. The code shouldn't even look at sha1[] if sha1_valid is false, as we do not know the hash value for the blob (reading from the working tree). Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-12-20 19:28 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-18 14:54 [PATCH 1/2] t/t8006: Demonstrate blame is broken when cachetextconv is on Kirill Smelkov 2010-12-18 14:54 ` [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid Kirill Smelkov 2010-12-18 16:13 ` Jeff King 2010-12-18 20:55 ` Kirill Smelkov 2010-12-19 3:23 ` Junio C Hamano 2010-12-19 12:10 ` Kirill Smelkov 2010-12-20 2:41 ` Junio C Hamano 2010-12-20 4:46 ` Jeff King 2010-12-20 19:28 ` Kirill Smelkov 2010-12-20 2:26 ` Junio C Hamano 2010-12-20 4:42 ` Jeff King 2010-12-20 8:42 ` Junio C Hamano 2010-12-20 2:32 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).