* Bug: write-tree corrupts intent-to-add index state @ 2012-11-06 8:53 Jonathon Mah 2012-11-06 12:37 ` Nguyen Thai Ngoc Duy 2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy 0 siblings, 2 replies; 16+ messages in thread From: Jonathon Mah @ 2012-11-06 8:53 UTC (permalink / raw) To: git@vger.kernel.org Hi revisionaries, I came across what appears to be a bug while working today. I had several changes, both staged and unstaged, and two files that I had marked with intent-to-add (git add -N). $ git status -sb ## Library-3.x...origin/Library-3.x [ahead 4] M Library/LIRootSource.m M Library/Library.xcodeproj/project.pbxproj [...] M Library/SceneKit/LISceneKitBookshelfScene.m AM Library/SceneKit/LISceneKitDisplayedMediumProxy.h AM Library/SceneKit/LISceneKitDisplayedMediumProxy.m (The last two files were added with -N.) I then attempted to stash my work, but that failed: $ git stash -k error: Entry 'Library/SceneKit/LISceneKitDisplayedMediumProxy.h' not uptodate. Cannot merge. Cannot save the current worktree state Re-checking the status showed that the intent-to-add files were now simply modified, yet they did not appear in git ls-tree -r HEAD. $ git status -sb ## Library-3.x...origin/Library-3.x [ahead 4] M Library/LIRootSource.m M Library/Library.xcodeproj/project.pbxproj [...] M Library/SceneKit/LISceneKitBookshelfScene.m M Library/SceneKit/LISceneKitDisplayedMediumProxy.h M Library/SceneKit/LISceneKitDisplayedMediumProxy.m It appears the index forgot that those files were new. So not only has the intent-to-add status been lost, but git status shows a file existing in neither HEAD nor the index as not-new-but-modified. Tracing through it, I narrowed it down to git write-tree affecting the index state. As far as I can tell, it's incorrect for that command to change the index. I'm now out of my (shallow) depth in the git codebase, so I'm throwing it out there for someone to tackle (hopefully). I've attached a failing test case. Additional notes for bug hunters: - I've only seen files in already-existing subdirectories lose intent-to-add status, whereas marking a file in the top-level as i-t-a is preserved during write-tree. - The 'git ls-files -s --debug' output doesn't change across the write-tree: 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 dir/baz ctime: 0:0 mtime: 0:0 dev: 0 ino: 0 uid: 0 gid: 0 size: 0 flags: 20004000 --- t/t2203-add-intent.sh | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh index ec35409..fcc67c0 100755 --- a/t/t2203-add-intent.sh +++ b/t/t2203-add-intent.sh @@ -62,5 +62,27 @@ test_expect_success 'can "commit -a" with an i-t-a entry' ' git commit -a -m all ' +test_expect_success 'i-t-a status unaffected by write-tree' ' + git reset --hard && + mkdir dir && + echo frotz >dir/bar && + git add dir && + git commit -m "establish dir" && + echo fizfaz >foo && + echo fragz >dir/baz && + git add rezrov && + git add -N foo && + git add -N dir/baz && + cat >expect <<-\EOF && + AM dir/baz + AM foo + EOF + git status --untracked-files=no --porcelain >actual && + test_cmp actual expect && + git write-tree && + git status --untracked-files=no --porcelain >actual && + test_cmp actual expect +' + test_done -- 1.8.0 Jonathon Mah me@JonathonMah.com ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug: write-tree corrupts intent-to-add index state 2012-11-06 8:53 Bug: write-tree corrupts intent-to-add index state Jonathon Mah @ 2012-11-06 12:37 ` Nguyen Thai Ngoc Duy 2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy 1 sibling, 0 replies; 16+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-11-06 12:37 UTC (permalink / raw) To: Jonathon Mah; +Cc: git@vger.kernel.org On Tue, Nov 06, 2012 at 12:53:27AM -0800, Jonathon Mah wrote: > It appears the index forgot that those files were new. So not only > has the intent-to-add status been lost, but git status shows a file > existing in neither HEAD nor the index as not-new-but-modified. > > Tracing through it, I narrowed it down to git write-tree affecting > the index state. As far as I can tell, it's incorrect for that > command to change the index. I'm now out of my (shallow) depth in > the git codebase, so I'm throwing it out there for someone to tackle > (hopefully). I've attached a failing test case. I played with your test a bit and found that if we skip the commit step, the bug disappears. I checked and believe that i-t-a flag in index is preserved, which leaves cache-tree as the culprit. If cache-tree is (incorrectly) valid, diff operations won't bother checking index. We used to not allow writing tree with i-t-a entries present. Then we allowed it, but forgot that we need to invalidate any paths that contain i-t-a entries, otherwise cache-tree won't truly reflect index. The below patch seems to do the trick, but I'd rather do the invalidation inside update_one() so we don't need to traverse over the index again. I haven't succesfully put cache-tree.c back in my head again to make it happen. Anybody is welcome to step up, or I'll finish it this weekend. -- 8< -- diff --git a/cache-tree.c b/cache-tree.c index 28ed657..30a8018 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it, i = update_one(it, cache, entries, "", 0, flags); if (i < 0) return i; + for (i = 0; i < entries; i++) + if (cache[i]->ce_flags & CE_INTENT_TO_ADD) + cache_tree_invalidate_path(it, cache[i]->name); return 0; } diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh index ec35409..3ddbd89 100755 --- a/t/t2203-add-intent.sh +++ b/t/t2203-add-intent.sh @@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' ' git commit -a -m all ' +test_expect_success 'cache-tree invalidates i-t-a paths' ' + git reset --hard && + mkdir dir && + : >dir/foo && + git add dir/foo && + git commit -m foo && + + : >dir/bar && + git add -N dir/bar && + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual && + + git write-tree >/dev/null && + + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual +' + test_done -- 8< -- ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH] cache-tree: invalidate i-t-a paths after writing trees 2012-11-06 8:53 Bug: write-tree corrupts intent-to-add index state Jonathon Mah 2012-11-06 12:37 ` Nguyen Thai Ngoc Duy @ 2012-11-09 11:04 ` Nguyễn Thái Ngọc Duy 2012-11-09 11:57 ` Junio C Hamano 2012-12-08 4:10 ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy 1 sibling, 2 replies; 16+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-11-09 11:04 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Jonathon Mah, Nguyễn Thái Ngọc Duy Intent-to-add entries used to forbid writing trees so it was not a problem. After commit 3f6d56d (commit: ignore intent-to-add entries instead of refusing - 2012-02-07), an index with i-t-a entries can write trees. However, the commit forgets to invalidate all paths leading to i-t-a entries. With fully valid cache-tree (e.g. after commit or write-tree), diff operations may prefer cache-tree to index and not see i-t-a entries in the index, because cache-tree does not have them. The invalidation is done after calling update_one() in cache_tree_update() for simplicity because it's probably not worth the complexity of changing a recursive function and the performance bottleneck won't likely fall to this new loop, rather in write_sha1_file or hash_sha1_file. But this means that if update_one() has new call sites, they must re-do the same what this commit does to avoid the same fault. Reported-by: Jonathon Mah <me@JonathonMah.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- cache-tree.c | 3 +++ t/t2203-add-intent.sh | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/cache-tree.c b/cache-tree.c index 28ed657..30a8018 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it, i = update_one(it, cache, entries, "", 0, flags); if (i < 0) return i; + for (i = 0; i < entries; i++) + if (cache[i]->ce_flags & CE_INTENT_TO_ADD) + cache_tree_invalidate_path(it, cache[i]->name); return 0; } diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh index ec35409..2a4a749 100755 --- a/t/t2203-add-intent.sh +++ b/t/t2203-add-intent.sh @@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' ' git commit -a -m all ' +test_expect_success 'cache-tree invalidates i-t-a paths' ' + git reset --hard && + mkdir dir && + : >dir/foo && + git add dir/foo && + git commit -m foo && + + : >dir/bar && + git add -N dir/bar && + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual && + + git write-tree >/dev/null && + + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual +' + test_done -- 1.8.0.rc2.23.g1fb49df ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees 2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy @ 2012-11-09 11:57 ` Junio C Hamano 2012-11-10 11:04 ` Nguyen Thai Ngoc Duy 2012-12-08 4:10 ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2012-11-09 11:57 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > diff --git a/cache-tree.c b/cache-tree.c > index 28ed657..30a8018 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it, > i = update_one(it, cache, entries, "", 0, flags); > if (i < 0) > return i; > + for (i = 0; i < entries; i++) > + if (cache[i]->ce_flags & CE_INTENT_TO_ADD) > + cache_tree_invalidate_path(it, cache[i]->name); > return 0; > } I notice there is another special case for CE_REMOVE but there is nothing that adjusts the cache-tree for these entries in the current codebase. I suspect the original code before we (perhaps incorrectly) updated the code not to error out upon I-T-A entries was fine only because we do not attempt to fully populate the cache-tree during a merge in the unpack-trees codepath, which will mark the index entries that are to be removed with CE_REMOVE in the resulting index. The solution implemented with this patch will break if we start updating the cache tree after a successful merge in unpack-trees, I suspect. An alternative might be to add a "phoney" bit next to "used" in the cache_tree structure, mark the cache tree as phoney when we skip an entry marked as CE_REMOVE or CE_ITA, and make the postprocessing loop this patch adds aware of that bit, instead of iterating over the index entries; instead, it would recurse the resulting cache tree and invalidate parts of the tree that have subtrees with the "phoney" bit set, or something. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees 2012-11-09 11:57 ` Junio C Hamano @ 2012-11-10 11:04 ` Nguyen Thai Ngoc Duy 2012-11-30 0:06 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-11-10 11:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah On Fri, Nov 9, 2012 at 6:57 PM, Junio C Hamano <gitster@pobox.com> wrote: > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > >> diff --git a/cache-tree.c b/cache-tree.c >> index 28ed657..30a8018 100644 >> --- a/cache-tree.c >> +++ b/cache-tree.c >> @@ -381,6 +381,9 @@ int cache_tree_update(struct cache_tree *it, >> i = update_one(it, cache, entries, "", 0, flags); >> if (i < 0) >> return i; >> + for (i = 0; i < entries; i++) >> + if (cache[i]->ce_flags & CE_INTENT_TO_ADD) >> + cache_tree_invalidate_path(it, cache[i]->name); >> return 0; >> } > > I notice there is another special case for CE_REMOVE but there is > nothing that adjusts the cache-tree for these entries in the current > codebase. > > I suspect the original code before we (perhaps incorrectly) updated > the code not to error out upon I-T-A entries was fine only because > we do not attempt to fully populate the cache-tree during a merge in > the unpack-trees codepath, which will mark the index entries that > are to be removed with CE_REMOVE in the resulting index. > > The solution implemented with this patch will break if we start > updating the cache tree after a successful merge in unpack-trees, I > suspect. I don't understand. I thought we handled CE_REMOVE correctly (i.e. no CE_REMOVE entries in cache tree even after a successful merge). Or should we keep CE_REMOVE in cache tree after a successful merge? > An alternative might be to add a "phoney" bit next to "used" in the > cache_tree structure, mark the cache tree as phoney when we skip an > entry marked as CE_REMOVE or CE_ITA, and make the postprocessing > loop this patch adds aware of that bit, instead of iterating over > the index entries; instead, it would recurse the resulting cache > tree and invalidate parts of the tree that have subtrees with the > "phoney" bit set, or something. Yeah, that sounds better. -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees 2012-11-10 11:04 ` Nguyen Thai Ngoc Duy @ 2012-11-30 0:06 ` Junio C Hamano 2012-11-30 1:26 ` Nguyen Thai Ngoc Duy 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2012-11-30 0:06 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: git, Jeff King, Jonathon Mah Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: >> An alternative might be to add a "phoney" bit next to "used" in the >> cache_tree structure, mark the cache tree as phoney when we skip an >> entry marked as CE_REMOVE or CE_ITA, and make the postprocessing >> loop this patch adds aware of that bit, instead of iterating over >> the index entries; instead, it would recurse the resulting cache >> tree and invalidate parts of the tree that have subtrees with the >> "phoney" bit set, or something. > > Yeah, that sounds better. Did anything happen to this topic after this? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] cache-tree: invalidate i-t-a paths after writing trees 2012-11-30 0:06 ` Junio C Hamano @ 2012-11-30 1:26 ` Nguyen Thai Ngoc Duy 0 siblings, 0 replies; 16+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-11-30 1:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah On Fri, Nov 30, 2012 at 7:06 AM, Junio C Hamano <gitster@pobox.com> wrote: > Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > >>> An alternative might be to add a "phoney" bit next to "used" in the >>> cache_tree structure, mark the cache tree as phoney when we skip an >>> entry marked as CE_REMOVE or CE_ITA, and make the postprocessing >>> loop this patch adds aware of that bit, instead of iterating over >>> the index entries; instead, it would recurse the resulting cache >>> tree and invalidate parts of the tree that have subtrees with the >>> "phoney" bit set, or something. >> >> Yeah, that sounds better. > > Did anything happen to this topic after this? Not from my side because I forgot to mark this thread as a todo item and unsurprisingly forgot about it. -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees 2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy 2012-11-09 11:57 ` Junio C Hamano @ 2012-12-08 4:10 ` Nguyễn Thái Ngọc Duy 2012-12-10 6:50 ` Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-12-08 4:10 UTC (permalink / raw) To: git Cc: Jeff King, Junio C Hamano, Jonathon Mah, Nguyễn Thái Ngọc Duy Intent-to-add entries used to forbid writing trees so it was not a problem. After commit 3f6d56d (commit: ignore intent-to-add entries instead of refusing - 2012-02-07), we can generate trees from an index with i-t-a entries. However, the commit forgets to invalidate all paths leading to i-t-a entries. With fully valid cache-tree (e.g. after commit or write-tree), diff operations may prefer cache-tree to index and not see i-t-a entries in the index, because cache-tree does not have them. Reported-by: Jonathon Mah <me@JonathonMah.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- Sorry for the late update, I have little time for git these days. It turns out that we don't need phony bit and another round for invalidation. We can do it in update_one. cache-tree.c | 30 ++++++++++++++++++++++++++---- t/t2203-add-intent.sh | 20 ++++++++++++++++++++ 2 files changed, 46 insertions(+), 4 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 28ed657..989a7ff 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it, int missing_ok = flags & WRITE_TREE_MISSING_OK; int dryrun = flags & WRITE_TREE_DRY_RUN; int i; + int to_invalidate = 0; if (0 <= it->entry_count && has_sha1_file(it->sha1)) return it->entry_count; @@ -324,7 +325,13 @@ static int update_one(struct cache_tree *it, if (!sub) die("cache-tree.c: '%.*s' in '%s' not found", entlen, path + baselen, path); - i += sub->cache_tree->entry_count - 1; + i--; /* this entry is already counted in "sub" */ + if (sub->cache_tree->entry_count < 0) { + i -= sub->cache_tree->entry_count; + to_invalidate = 1; + } + else + i += sub->cache_tree->entry_count; sha1 = sub->cache_tree->sha1; mode = S_IFDIR; } @@ -339,8 +346,23 @@ static int update_one(struct cache_tree *it, mode, sha1_to_hex(sha1), entlen+baselen, path); } - if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD)) - continue; /* entry being removed or placeholder */ + /* + * CE_REMOVE entries are removed before the index is + * written to disk. Skip them to remain consistent + * with the future on-disk index. + */ + if (ce->ce_flags & CE_REMOVE) + continue; + + /* + * CE_INTENT_TO_ADD entries exist on on-disk index but + * they are not part of generated trees. Invalidate up + * to root to force cache-tree users to read elsewhere. + */ + if (ce->ce_flags & CE_INTENT_TO_ADD) { + to_invalidate = 1; + continue; + } strbuf_grow(&buffer, entlen + 100); strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0'); @@ -360,7 +382,7 @@ static int update_one(struct cache_tree *it, } strbuf_release(&buffer); - it->entry_count = i; + it->entry_count = to_invalidate ? -i : i; #if DEBUG fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n", it->entry_count, it->subtree_nr, diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh index ec35409..2a4a749 100755 --- a/t/t2203-add-intent.sh +++ b/t/t2203-add-intent.sh @@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' ' git commit -a -m all ' +test_expect_success 'cache-tree invalidates i-t-a paths' ' + git reset --hard && + mkdir dir && + : >dir/foo && + git add dir/foo && + git commit -m foo && + + : >dir/bar && + git add -N dir/bar && + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual && + + git write-tree >/dev/null && + + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual +' + test_done -- 1.8.0.rc2.23.g1fb49df ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees 2012-12-08 4:10 ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy @ 2012-12-10 6:50 ` Junio C Hamano 2012-12-10 11:53 ` Nguyen Thai Ngoc Duy 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2012-12-10 6:50 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > diff --git a/cache-tree.c b/cache-tree.c > index 28ed657..989a7ff 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it, > int missing_ok = flags & WRITE_TREE_MISSING_OK; > int dryrun = flags & WRITE_TREE_DRY_RUN; > int i; > + int to_invalidate = 0; > > if (0 <= it->entry_count && has_sha1_file(it->sha1)) > return it->entry_count; > @@ -324,7 +325,13 @@ static int update_one(struct cache_tree *it, > if (!sub) > die("cache-tree.c: '%.*s' in '%s' not found", > entlen, path + baselen, path); > - i += sub->cache_tree->entry_count - 1; > + i--; /* this entry is already counted in "sub" */ > + if (sub->cache_tree->entry_count < 0) { > + i -= sub->cache_tree->entry_count; > + to_invalidate = 1; > + } > + else > + i += sub->cache_tree->entry_count; Hrm. update_one() is prepared to see a cache-tree whose entry count is zero (see the context lines in the previous hunk) and the invariant for the rest of the code is "if 0 <= entry_count, the cached tree is valid; invalid cache-tree has -1 in entry_count. More importantly, entry_count negated does not in general express how many entries there are in the subtree and does not tell us how many index entries to skip. > @@ -339,8 +346,23 @@ static int update_one(struct cache_tree *it, > mode, sha1_to_hex(sha1), entlen+baselen, path); > } > > - if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD)) > - continue; /* entry being removed or placeholder */ > + /* > + * CE_REMOVE entries are removed before the index is > + * written to disk. Skip them to remain consistent > + * with the future on-disk index. > + */ > + if (ce->ce_flags & CE_REMOVE) > + continue; > + > + /* > + * CE_INTENT_TO_ADD entries exist on on-disk index but > + * they are not part of generated trees. Invalidate up > + * to root to force cache-tree users to read elsewhere. > + */ > + if (ce->ce_flags & CE_INTENT_TO_ADD) { > + to_invalidate = 1; > + continue; > + } Thanks for documenting these. > @@ -360,7 +382,7 @@ static int update_one(struct cache_tree *it, > } > > strbuf_release(&buffer); > - it->entry_count = i; > + it->entry_count = to_invalidate ? -i : i; See above. I am not fundamentally opposed to a change to redefine entry_count so that it always maintains how many index entries the subtree covers, even for invalidated subtree, but I do not think this patch alone is sufficient to maintain such invariant. > #if DEBUG > fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n", > it->entry_count, it->subtree_nr, > diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh > index ec35409..2a4a749 100755 > --- a/t/t2203-add-intent.sh > +++ b/t/t2203-add-intent.sh > @@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' ' > git commit -a -m all > ' > > +test_expect_success 'cache-tree invalidates i-t-a paths' ' > + git reset --hard && > + mkdir dir && > + : >dir/foo && > + git add dir/foo && > + git commit -m foo && > + > + : >dir/bar && > + git add -N dir/bar && > + git diff --cached --name-only >actual && > + echo dir/bar >expect && > + test_cmp expect actual && > + > + git write-tree >/dev/null && > + > + git diff --cached --name-only >actual && > + echo dir/bar >expect && > + test_cmp expect actual > +' > + > test_done ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees 2012-12-10 6:50 ` Junio C Hamano @ 2012-12-10 11:53 ` Nguyen Thai Ngoc Duy 2012-12-10 17:22 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-12-10 11:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah On Mon, Dec 10, 2012 at 1:50 PM, Junio C Hamano <gitster@pobox.com> wrote: > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > >> diff --git a/cache-tree.c b/cache-tree.c >> index 28ed657..989a7ff 100644 >> --- a/cache-tree.c >> +++ b/cache-tree.c >> @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it, >> int missing_ok = flags & WRITE_TREE_MISSING_OK; >> int dryrun = flags & WRITE_TREE_DRY_RUN; >> int i; >> + int to_invalidate = 0; >> >> if (0 <= it->entry_count && has_sha1_file(it->sha1)) >> return it->entry_count; >> @@ -324,7 +325,13 @@ static int update_one(struct cache_tree *it, >> if (!sub) >> die("cache-tree.c: '%.*s' in '%s' not found", >> entlen, path + baselen, path); >> - i += sub->cache_tree->entry_count - 1; >> + i--; /* this entry is already counted in "sub" */ >> + if (sub->cache_tree->entry_count < 0) { >> + i -= sub->cache_tree->entry_count; >> + to_invalidate = 1; >> + } >> + else >> + i += sub->cache_tree->entry_count; > > Hrm. update_one() is prepared to see a cache-tree whose entry count > is zero (see the context lines in the previous hunk) and the > invariant for the rest of the code is "if 0 <= entry_count, the > cached tree is valid; invalid cache-tree has -1 in entry_count. > More importantly, entry_count negated does not in general express > how many entries there are in the subtree and does not tell us how > many index entries to skip. Yeah I use entry_count for two different things here. In the previous (unsent) version of the patch I had "entry_count = -1" right after "i -= entry_count" >> + if (sub->cache_tree->entry_count < 0) { >> + i -= sub->cache_tree->entry_count; >> + sub->cache_tree->entry_count = -1; >> + to_invalidate = 1; >> + } which makes it clearer that the use of negative entry count is only valid within update_one. Then I changed my mind because it says 'negative means "invalid"' in cache-tree.h. So, put "entry_count = -1" back or just add another field to struct cache_tree for this? -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees 2012-12-10 11:53 ` Nguyen Thai Ngoc Duy @ 2012-12-10 17:22 ` Junio C Hamano 2012-12-13 1:39 ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2012-12-10 17:22 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: git, Jeff King, Jonathon Mah Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > Yeah I use entry_count for two different things here. In the previous > (unsent) version of the patch I had "entry_count = -1" right after "i > -= entry_count" > >>> + if (sub->cache_tree->entry_count < 0) { >>> + i -= sub->cache_tree->entry_count; >>> + sub->cache_tree->entry_count = -1; >>> + to_invalidate = 1; >>> + } > > > which makes it clearer that the use of negative entry count is only > valid within update_one. Then I changed my mind because it says > 'negative means "invalid"' in cache-tree.h. But you make it to mean not just 'negative means invalid' but 'negate it and you can learn how many index entries to skip to reach the entry outside the subtree'. That is what I was wondering about. I do not think that new invariant does not hold in general; it may hold true while inside this function, but immediately after somebody else calls cache_tree_invalidate_path() it won't be true anymore. Leaking the information to outside callers that can easily be mistaken as if it may mean something without any comment looks like we are asking for trouble. > So, put "entry_count = -1" > back or just add another field to struct cache_tree for this? As long as it is made clear with in-code comment that says "negative entry count, when negated, will give us how many index entries are covered by this invalid cache tree, only inside this function", and such a negative entry_count is reset to -1 to avoid leaking out the value that will soon go stale to the outside world in the "write this tree out" loop, I think the v2 patch is OK, if not ideal. If we were to commit to keep the new invariant true throughout the system, on the other hand, I suspect that a boolean flag "valid" may help people who keep i-t-a entries around across write-tree calls. The first if () statement in the update_one() function can check the validity, and you can say the cache tree is valid even if the section in the index that is covered by the cache-tree contains i-t-a entries _after_ you wrote it out as a tree, no? That may make the semantics a lot cleaner, I suspect. The new semantics might be: * update_one() returns negative only when the section of the index given to it cannot be written out as a tree (i.e. not merged, or corrupt index); * update_one() returns the number of entries in the index covered by the tree, including i-t-a entries (but not REMOVED, as these entries will not be in the index after the cache-tree has been written out); * cache_tree->valid tells if the cache_tree->sha1 is valid; the section of the index the tree covers may or may not have i-t-a entries in it; * cache_tree->entry_count holds the number of index entries the tree covers, couting i-t-a entries. This field is only meaningful when cache_tree->valid is true. or something like that. I noticed that the recent "ignore i-t-a only while writing trees instead of erroring out" broke 331fcb5 (git add --intent-to-add: do not let an empty blob be committed by accident, 2008-11-28) in another way, by the way. In verify_cache(), there is an unreachable else clause to warn about "not added yet" entries that should have been removed but is left behind. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees 2012-12-10 17:22 ` Junio C Hamano @ 2012-12-13 1:39 ` Nguyễn Thái Ngọc Duy 2012-12-13 1:39 ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy 2012-12-13 2:04 ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano 0 siblings, 2 replies; 16+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-12-13 1:39 UTC (permalink / raw) To: git Cc: Jeff King, Junio C Hamano, Jonathon Mah, Nguyễn Thái Ngọc Duy Intent-to-add entries used to forbid writing trees so it was not a problem. After commit 3f6d56d (commit: ignore intent-to-add entries instead of refusing - 2012-02-07), we can generate trees from an index with i-t-a entries. However, the commit forgets to invalidate all paths leading to i-t-a entries. With fully valid cache-tree (e.g. after commit or write-tree), diff operations may prefer cache-tree to index and not see i-t-a entries in the index, because cache-tree does not have them. Reported-by: Jonathon Mah <me@JonathonMah.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- This version ensures that entry_count can only be >= -1 after update_one returns. Not ideal but good enough. cache-tree.c | 40 ++++++++++++++++++++++++++++++++++++---- t/t2203-add-intent.sh | 20 ++++++++++++++++++++ 2 files changed, 56 insertions(+), 4 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 28ed657..1fbc81a 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it, int missing_ok = flags & WRITE_TREE_MISSING_OK; int dryrun = flags & WRITE_TREE_DRY_RUN; int i; + int to_invalidate = 0; if (0 <= it->entry_count && has_sha1_file(it->sha1)) return it->entry_count; @@ -324,7 +325,14 @@ static int update_one(struct cache_tree *it, if (!sub) die("cache-tree.c: '%.*s' in '%s' not found", entlen, path + baselen, path); - i += sub->cache_tree->entry_count - 1; + i--; /* this entry is already counted in "sub" */ + if (sub->cache_tree->entry_count < 0) { + i -= sub->cache_tree->entry_count; + sub->cache_tree->entry_count = -1; + to_invalidate = 1; + } + else + i += sub->cache_tree->entry_count; sha1 = sub->cache_tree->sha1; mode = S_IFDIR; } @@ -339,8 +347,23 @@ static int update_one(struct cache_tree *it, mode, sha1_to_hex(sha1), entlen+baselen, path); } - if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD)) - continue; /* entry being removed or placeholder */ + /* + * CE_REMOVE entries are removed before the index is + * written to disk. Skip them to remain consistent + * with the future on-disk index. + */ + if (ce->ce_flags & CE_REMOVE) + continue; + + /* + * CE_INTENT_TO_ADD entries exist on on-disk index but + * they are not part of generated trees. Invalidate up + * to root to force cache-tree users to read elsewhere. + */ + if (ce->ce_flags & CE_INTENT_TO_ADD) { + to_invalidate = 1; + continue; + } strbuf_grow(&buffer, entlen + 100); strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0'); @@ -360,7 +383,7 @@ static int update_one(struct cache_tree *it, } strbuf_release(&buffer); - it->entry_count = i; + it->entry_count = to_invalidate ? -i : i; #if DEBUG fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n", it->entry_count, it->subtree_nr, @@ -381,6 +404,15 @@ int cache_tree_update(struct cache_tree *it, i = update_one(it, cache, entries, "", 0, flags); if (i < 0) return i; + /* + * update_one() uses negative entry_count as a way to mark an + * entry invalid _and_ pass the number of entries back to + * itself at the parent level. This is for internal use and + * should not be leaked out after the top-level update_one + * exits. + */ + if (it->entry_count < 0) + it->entry_count = -1; return 0; } diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh index ec35409..2a4a749 100755 --- a/t/t2203-add-intent.sh +++ b/t/t2203-add-intent.sh @@ -62,5 +62,25 @@ test_expect_success 'can "commit -a" with an i-t-a entry' ' git commit -a -m all ' +test_expect_success 'cache-tree invalidates i-t-a paths' ' + git reset --hard && + mkdir dir && + : >dir/foo && + git add dir/foo && + git commit -m foo && + + : >dir/bar && + git add -N dir/bar && + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual && + + git write-tree >/dev/null && + + git diff --cached --name-only >actual && + echo dir/bar >expect && + test_cmp expect actual +' + test_done -- 1.8.0.rc2.23.g1fb49df ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() 2012-12-13 1:39 ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy @ 2012-12-13 1:39 ` Nguyễn Thái Ngọc Duy 2012-12-13 18:34 ` Junio C Hamano 2012-12-13 2:04 ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-12-13 1:39 UTC (permalink / raw) To: git Cc: Jeff King, Junio C Hamano, Jonathon Mah, Nguyễn Thái Ngọc Duy This code is added in 331fcb5 (git add --intent-to-add: do not let an empty blob be committed by accident - 2008-11-28) to forbid committing when i-t-a entries are present. When we allow that, we forgot to remove this unreachable code. Noticed-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- cache-tree.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 1fbc81a..3e79e27 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -166,12 +166,8 @@ static int verify_cache(struct cache_entry **cache, fprintf(stderr, "...\n"); break; } - if (ce_stage(ce)) - fprintf(stderr, "%s: unmerged (%s)\n", - ce->name, sha1_to_hex(ce->sha1)); - else - fprintf(stderr, "%s: not added yet\n", - ce->name); + fprintf(stderr, "%s: unmerged (%s)\n", + ce->name, sha1_to_hex(ce->sha1)); } } if (funny) -- 1.8.0.rc2.23.g1fb49df ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() 2012-12-13 1:39 ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy @ 2012-12-13 18:34 ` Junio C Hamano 0 siblings, 0 replies; 16+ messages in thread From: Junio C Hamano @ 2012-12-13 18:34 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah Will replace the one in 'pu' with these two. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees 2012-12-13 1:39 ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy 2012-12-13 1:39 ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy @ 2012-12-13 2:04 ` Junio C Hamano 2012-12-15 2:52 ` Nguyen Thai Ngoc Duy 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2012-12-13 2:04 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Jonathon Mah Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > Intent-to-add entries used to forbid writing trees so it was not a > problem. After commit 3f6d56d (commit: ignore intent-to-add entries > instead of refusing - 2012-02-07), we can generate trees from an index > with i-t-a entries. > > However, the commit forgets to invalidate all paths leading to i-t-a > entries. With fully valid cache-tree (e.g. after commit or > write-tree), diff operations may prefer cache-tree to index and not > see i-t-a entries in the index, because cache-tree does not have them. > > Reported-by: Jonathon Mah <me@JonathonMah.com> > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > This version ensures that entry_count can only be >= -1 after > update_one returns. Not ideal but good enough. > > cache-tree.c | 40 ++++++++++++++++++++++++++++++++++++---- > t/t2203-add-intent.sh | 20 ++++++++++++++++++++ > 2 files changed, 56 insertions(+), 4 deletions(-) > > diff --git a/cache-tree.c b/cache-tree.c > index 28ed657..1fbc81a 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -248,6 +248,7 @@ static int update_one(struct cache_tree *it, > int missing_ok = flags & WRITE_TREE_MISSING_OK; > int dryrun = flags & WRITE_TREE_DRY_RUN; > int i; > + int to_invalidate = 0; > > if (0 <= it->entry_count && has_sha1_file(it->sha1)) > return it->entry_count; > @@ -324,7 +325,14 @@ static int update_one(struct cache_tree *it, > if (!sub) > die("cache-tree.c: '%.*s' in '%s' not found", > entlen, path + baselen, path); > - i += sub->cache_tree->entry_count - 1; > + i--; /* this entry is already counted in "sub" */ Huh? The "-1" in the original is the bog-standard compensation for the for(;;i++) loop. > + if (sub->cache_tree->entry_count < 0) { > + i -= sub->cache_tree->entry_count; > + sub->cache_tree->entry_count = -1; > + to_invalidate = 1; > + } > + else > + i += sub->cache_tree->entry_count; While the rewritten version is not *wrong* per-se, I have a feeling that it may be much easier to read if written like this: if (sub->cache_tree_entry_count < 0) { to_invalidate = 1; to_skip = 0 - sub->cache_tree_entry_count; sub->cache_tree_entry_count = -1; } else { to_skip = sub->cache_tree_entry_count; } i += to_skip - 1; > @@ -360,7 +383,7 @@ static int update_one(struct cache_tree *it, > } > > strbuf_release(&buffer); > - it->entry_count = i; > + it->entry_count = to_invalidate ? -i : i; > #if DEBUG > fprintf(stderr, "cache-tree update-one (%d ent, %d subtree) %s\n", > it->entry_count, it->subtree_nr, > @@ -381,6 +404,15 @@ int cache_tree_update(struct cache_tree *it, > i = update_one(it, cache, entries, "", 0, flags); > if (i < 0) > return i; > + /* > + * update_one() uses negative entry_count as a way to mark an > + * entry invalid _and_ pass the number of entries back to > + * itself at the parent level. This is for internal use and > + * should not be leaked out after the top-level update_one > + * exits. > + */ > + if (it->entry_count < 0) > + it->entry_count = -1; Nice. I think what entry_count means immediately after update_one() returned should be commented near the beginning of that function, too, though. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees 2012-12-13 2:04 ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano @ 2012-12-15 2:52 ` Nguyen Thai Ngoc Duy 0 siblings, 0 replies; 16+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-12-15 2:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Jonathon Mah On Thu, Dec 13, 2012 at 9:04 AM, Junio C Hamano <gitster@pobox.com> wrote: >> @@ -324,7 +325,14 @@ static int update_one(struct cache_tree *it, >> if (!sub) >> die("cache-tree.c: '%.*s' in '%s' not found", >> entlen, path + baselen, path); >> - i += sub->cache_tree->entry_count - 1; >> + i--; /* this entry is already counted in "sub" */ > > Huh? > > The "-1" in the original is the bog-standard compensation for the > for(;;i++) loop. Exactly. It took me a while to figure out what " - 1" was for and I wanted to avoid that for other developers. Only I worded it badly. I'll replace the for loop with a while loop to make it clearer... > >> + if (sub->cache_tree->entry_count < 0) { >> + i -= sub->cache_tree->entry_count; >> + sub->cache_tree->entry_count = -1; >> + to_invalidate = 1; >> + } >> + else >> + i += sub->cache_tree->entry_count; > > While the rewritten version is not *wrong* per-se, I have a feeling > that it may be much easier to read if written like this: > > if (sub->cache_tree_entry_count < 0) { > to_invalidate = 1; > to_skip = 0 - sub->cache_tree_entry_count; > sub->cache_tree_entry_count = -1; > } else { > to_skip = sub->cache_tree_entry_count; > } > i += to_skip - 1; > ..or this would be fine too. Which way to go? A while we're still at the cache tree > - if (ce->ce_flags & (CE_REMOVE | CE_INTENT_TO_ADD)) > - continue; /* entry being removed or placeholder */ > + /* > + * CE_REMOVE entries are removed before the index is > + * written to disk. Skip them to remain consistent > + * with the future on-disk index. > + */ > + if (ce->ce_flags & CE_REMOVE) > + continue; > + A CE_REMOVE entry is removed later from the index, however it is still counted in entry_count. entry_count serves two purposes: to skip the number of processed entries in the in-memory index, and to record the number of entries in the on-disk index. These numbers do not match when CE_REMOVE is present. We have correct in-memory entry_count, which means incorrect on-disk entry_count in this case. I tested with an index that has a/b and a/c. The latter has CE_REMOVE. After writing cache tree I get: $ git ls-files --stage 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 a/b $ ../test-dump-cache-tree 878e27c626266ac04087a203e4bdd396dcf74763 (2 entries, 1 subtrees) 878e27c626266ac04087a203e4bdd396dcf74763 #(ref) (1 entries, 1 subtrees) 4277b6e69d25e5efa77c455340557b384a4c018a a/ (2 entries, 0 subtrees) 4277b6e69d25e5efa77c455340557b384a4c018a #(ref) a/ (1 entries, 0 subtrees) If I throw out that index, create a new one with a/b alone and write-tree, I get $ ../test-dump-cache-tree 878e27c626266ac04087a203e4bdd396dcf74763 (1 entries, 1 subtrees) 4277b6e69d25e5efa77c455340557b384a4c018a a/ (1 entries, 0 subtrees) Shall we fix this too? I'm thinking of adding "skip" argument to update_one and i += sub->cache_tree->entry_count - 1; would become i += sub->cache_tree->entry_count + skip - 1; and entry_count would always reflect on-disk value. This "skip" can be reused for this i-t-a patch as well. -- Duy ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-12-15 2:53 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-06 8:53 Bug: write-tree corrupts intent-to-add index state Jonathon Mah 2012-11-06 12:37 ` Nguyen Thai Ngoc Duy 2012-11-09 11:04 ` [PATCH] cache-tree: invalidate i-t-a paths after writing trees Nguyễn Thái Ngọc Duy 2012-11-09 11:57 ` Junio C Hamano 2012-11-10 11:04 ` Nguyen Thai Ngoc Duy 2012-11-30 0:06 ` Junio C Hamano 2012-11-30 1:26 ` Nguyen Thai Ngoc Duy 2012-12-08 4:10 ` [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees Nguyễn Thái Ngọc Duy 2012-12-10 6:50 ` Junio C Hamano 2012-12-10 11:53 ` Nguyen Thai Ngoc Duy 2012-12-10 17:22 ` Junio C Hamano 2012-12-13 1:39 ` [PATCH v3 1/2] " Nguyễn Thái Ngọc Duy 2012-12-13 1:39 ` [PATCH v3 2/2] cache-tree: remove dead i-t-a code in verify_cache() Nguyễn Thái Ngọc Duy 2012-12-13 18:34 ` Junio C Hamano 2012-12-13 2:04 ` [PATCH v3 1/2] cache-tree: invalidate i-t-a paths after generating trees Junio C Hamano 2012-12-15 2:52 ` Nguyen Thai Ngoc Duy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).