[RFH] CE_REMOVE conversion

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFH] CE_REMOVE conversion
@ 2008-02-21  8:39 Junio C Hamano
  2008-02-21  9:07 ` [RFH] index_name_exists() conversion Junio C Hamano
  2008-02-21 16:05 ` [RFH] CE_REMOVE conversion Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2008-02-21  8:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

You converted "ce->ce_mode = 0" to "ce->ce_flags |= CE_REMOVE" in an
earlier commit 7a51ed6 (Make on-disk index representation
separate from in-core one).

I am having two issues with this conversion, related to read-tree.

If you say "git reset --hard" with an unmerged path in the
index, the entry does not get resurrected from the HEAD.  It
instead just goes away (i.e. you lose a path in the index).
If you run "git reset --hard" twice, the second run will
resurrect it, of course, as the first one removed the unmerged
paths.

"git reset --hard" internally runs "read-tree -u --reset HEAD",
and the oneway_merge() misbehaves.

        commit 7a51ed66f653c248993b3c4a61932e47933d835e
        Author: Linus Torvalds <torvalds@linux-foundation.org>
        Date:   Mon Jan 14 16:03:17 2008 -0800

            Make on-disk index representation separate from in-core one

        diff --git a/builtin-read-tree.c b/builtin-read-tree.c
        index c0ea034..5785401 100644
        --- a/builtin-read-tree.c
        +++ b/builtin-read-tree.c
        @@ -45,8 +45,7 @@ static int read_cache_unmerged(void)
                                        continue;
                                cache_tree_invalidate_path(active_cache_tree, ce->name);
                                last = ce;
        -			ce->ce_mode = 0;
        -			ce->ce_flags &= ~htons(CE_STAGEMASK);
        +			ce->ce_flags |= CE_REMOVE;
                        }
                        *dst++ = ce;
                }

One issue is somewhat apparent.  The conversion forgot to drop
the stage information (i.e. it does not tell "that stage#0 path
is to be removed" anymore).

Another thing is a bit trickier.  Now because you do not smudge
ce->ce_mode, when oneway_merge in unpack-trees.c compares it
(which comes as *old) with what is read from HEAD, it triggers
this codepath:

	if (old && same(old, a)) {
		if (o->reset) {
			struct stat st;
			if (lstat(old->name, &st) ||
			    ce_match_stat(old, &st, CE_MATCH_IGNORE_VALID))
				old->ce_flags |= CE_UPDATE;
		}
		return keep_entry(old, o);
	}

Here, same(old, a) yields true, old->ce_flags gets CE_UPDATE,
and then keep_entry(old, o) keeps that old entry, which is at
stage #1 and has (CE_REMOVE|CE_UPDATE) flags set.  This index is
written out, making the resulting index empty.

The reason we keep an index entry with ce_mode = 0 (and now
CE_REMOVE) when we want to remive it is because we would want to
be able to say "propagate this change to the work tree" when run
with CE_UPDATE.  But I think the reason read_cache_unmerged()
says "this unmerged entry is gone" is _not_ because we would
want to remove the corresponding path from the work tree.

The old code happened to work because "ce_mode = 0" entries
would have never matched with what was read from the HEAD tree,
and we would never have triggered the keep_entry() codepath.

We could of course hack read_cache_unmerged() to:

	if (ce_stage(ce)) {
		if (last && !strcmp(ce->name, last->name))
			continue;
		cache_tree_invalidate_path(active_cache_tree, ce->name);
		last = ce;
		ce->ce_flags &= ~CE_STAGEMASK;
		/* Do not match with entries from trees! */
		ce->ce_mode = 0;
	}
	*dst++ = ce;

but I am wondering if we should instead really _remove_ entries
from the index instead, just like the attached patch.

Thoughts?

 builtin-read-tree.c |    2 +-
 t/t7104-reset.sh    |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index 5785401..726fb0b 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -45,7 +45,7 @@ static int read_cache_unmerged(void)
 				continue;
 			cache_tree_invalidate_path(active_cache_tree, ce->name);
 			last = ce;
-			ce->ce_flags |= CE_REMOVE;
+			continue;
 		}
 		*dst++ = ce;
 	}
diff --git a/t/t7104-reset.sh b/t/t7104-reset.sh
new file mode 100755
index 0000000..831078c
--- /dev/null
+++ b/t/t7104-reset.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='reset --hard unmerged'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+
+	>hello &&
+	git add hello &&
+	git commit -m world &&
+
+	H=$(git rev-parse :hello) &&
+	git rm --cached hello &&
+	for i in 1 2 3
+	do
+		echo "100644 $H $i	hello"
+	done | git update-index --index-info &&
+
+	rm -f hello &&
+	mkdir -p hello &&
+	>hello/world &&
+	test "$(git ls-files -o)" = hello/world
+
+'
+
+test_expect_failure 'reset --hard loses the index' '
+
+	git reset --hard &&
+	git ls-files --error-unmatch hello &&
+	test -f hello
+
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFH] index_name_exists() conversion
  2008-02-21  8:39 [RFH] CE_REMOVE conversion Junio C Hamano
@ 2008-02-21  9:07 ` Junio C Hamano
  2008-02-21 16:15   ` Linus Torvalds
  2008-02-21 16:05 ` [RFH] CE_REMOVE conversion Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2008-02-21  9:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

While (hopefully) I have your attention, I have another question
about the recent optimization.

        commit cf558704fb68514a820e3666968967c900e0fd29
        Author: Linus Torvalds <torvalds@linux-foundation.org>

            Create pathname-based hash-table lookup into index

        diff --git a/dir.c b/dir.c
        index 1b9cc7a..6543105 100644
        --- a/dir.c
        +++ b/dir.c
        @@ -346,7 +346,7 @@ static struct dir_entry *dir_entry_new(cons...

         struct dir_entry *dir_add_name(struct dir_struct *dir, const c...
         {
        -	if (cache_name_pos(pathname, len) >= 0)
        +	if (cache_name_exists(pathname, len))
                        return NULL;

                ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);

This used to check if there is a stage#0 entry of that name in
the index.  But the name-hash based index look-up does not care
about the stage as far as I can tell.

I haven't managed to come up with a failure case to demonstrate,
but I think this can affect ls-files and clean, potentially in a
bad way.  "git add dir/" to resolve unmerged paths under dir/
somewhere should not be affected, as the codepath calls
read_directory() before reading the cache (so the check will
always say "Nope, we do not have such a path in the index yet").

I am not sure what the right solution would be.  The motivation
for the patch was to make "does this name exist in the index"
and also "does this exist under an equivalent name" efficient,
but it is asking a question that is not precise enough.  Are we
interested in such an entry at any stage, or only at stage#0, or
what?

This could be related to the recent "status segfaults, bisected
to cf558704" issue posted to the list.  I dunno.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFH] index_name_exists() conversion
  2008-02-21  9:07 ` [RFH] index_name_exists() conversion Junio C Hamano
@ 2008-02-21 16:15   ` Linus Torvalds
  2008-02-21 16:29     ` Johannes Schindelin
  2008-02-21 17:00     ` Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2008-02-21 16:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 21 Feb 2008, Junio C Hamano wrote:
>
>          struct dir_entry *dir_add_name(struct dir_struct *dir, const c...
>          {
>         -	if (cache_name_pos(pathname, len) >= 0)
>         +	if (cache_name_exists(pathname, len))
>                         return NULL;
> 
>                 ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);
> 
> This used to check if there is a stage#0 entry of that name in
> the index.  But the name-hash based index look-up does not care
> about the stage as far as I can tell.

I think the new code is much more correct, and we should always care about 
whether the name exists in the index, not whether it exists in stage#0.

IOW, if we ask for "show other files", we definitely want to ignore 
paths we already know about, and conversely, if we do an operation that is 
supposed to care about files in a directory we already know about, then an 
unmerged entry should count the same as a merged one.

> This could be related to the recent "status segfaults, bisected
> to cf558704" issue posted to the list.  I dunno.

Is there a backtrace for that? It's certainly possible that some strange 
path depended on only matching merged entries, but I would consider that 
unlikely and buggy.

The segfault is more likely related to something more subtle, like 
actually freeing a "cache_entry" after having removed it from the index 
(which we should never do - they were always supposed to stay around 
because they might not have been allocated with 'malloc()' in the first 
place, but with the name indexing thing it now is *always* a bug to do it, 
even if you _did_ allocate it with free()).

(Also, some code used to just reuse the same cache entry multiple times, 
and that got illegal for the same reason).

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFH] index_name_exists() conversion
  2008-02-21 16:15   ` Linus Torvalds
@ 2008-02-21 16:29     ` Johannes Schindelin
  2008-02-21 17:00     ` Linus Torvalds
  1 sibling, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2008-02-21 16:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Thu, 21 Feb 2008, Linus Torvalds wrote:

> I think the new code is much more correct,

Heh.  So much truthiness in this half-sentence ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFH] index_name_exists() conversion
  2008-02-21 16:15   ` Linus Torvalds
  2008-02-21 16:29     ` Johannes Schindelin
@ 2008-02-21 17:00     ` Linus Torvalds
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2008-02-21 17:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Thu, 21 Feb 2008, Linus Torvalds wrote:
> 
> (Also, some code used to just reuse the same cache entry multiple times, 
> and that got illegal for the same reason).

Ahh. I think I may have found it.

It's stupid.

In hash_index_entry(), we insert the entry into the name hash, but if it 
got inserted as the first entry, we don't actually set the ->next pointer 
to NULL.

So the chain would end up pointing to some random crud. I think we were 
just lucky with our allocators generally filling those things with zero.

See if this fixes it.

		Linus

---
diff --git a/read-cache.c b/read-cache.c
--- a/read-cache.c
+++ b/read-cache.c
@@ -39,6 +39,7 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 	void **pos;
 	unsigned int hash = hash_name(ce->name, ce_namelen(ce));
 
+	ce->next = NULL;
 	pos = insert_hash(hash, ce, &istate->name_hash);
 	if (pos) {
 		ce->next = *pos;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFH] CE_REMOVE conversion
  2008-02-21  8:39 [RFH] CE_REMOVE conversion Junio C Hamano
  2008-02-21  9:07 ` [RFH] index_name_exists() conversion Junio C Hamano
@ 2008-02-21 16:05 ` Linus Torvalds
  2008-02-21 16:43   ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2008-02-21 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 21 Feb 2008, Junio C Hamano wrote:
> 
> but I am wondering if we should instead really _remove_ entries
> from the index instead, just like the attached patch.

Absolutely. I think that your patch is "ObviouslyCorrect(tm)", and that 
code should never have used CE_REMOVE in the first place. The fact that 
the old code worked was perhaps intentional, but still very subtle and 
lucky.

The whole (and only) point of CE_REMOVE was and is to do the "-u" handling 
when resolving a merge, and keep the thing in the index in order to later 
still do work with it (ie remove it) when we have verified that the index 
is complete. But that's not what "read_cache_unmerged()" is about: it's 
very explicitly documented to just ignore the unmerged entries.

So your patch looks very good to me. Basically, the merge code absolutely 
does not want to be called with some entries already marked as CE_REMOVE 
(it's supposed to *add* those markers as part of resolving the merge, but 
it is not able to handle them in the source).

So ack, ack, ack.

			Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFH] CE_REMOVE conversion
  2008-02-21 16:05 ` [RFH] CE_REMOVE conversion Linus Torvalds
@ 2008-02-21 16:43   ` Junio C Hamano
  2008-02-21 17:31     ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2008-02-21 16:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 21 Feb 2008, Junio C Hamano wrote:
>> 
>> but I am wondering if we should instead really _remove_ entries
>> from the index instead, just like the attached patch.
> ...
> So your patch looks very good to me. Basically, the merge code absolutely 
> does not want to be called with some entries already marked as CE_REMOVE 
> (it's supposed to *add* those markers as part of resolving the merge, but 
> it is not able to handle them in the source).
>
> So ack, ack, ack.

And we probably should unhash the entry instead of just removing
it?  Come to think of it, I am starting to wonder if the
entries unpack-trees add to and drop from the index are hashed
and unhashed correctly...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFH] CE_REMOVE conversion
  2008-02-21 16:43   ` Junio C Hamano
@ 2008-02-21 17:31     ` Linus Torvalds
  2008-02-21 20:06       ` Linus Torvalds
  2008-02-21 20:08       ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2008-02-21 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Thu, 21 Feb 2008, Junio C Hamano wrote:
> 
> And we probably should unhash the entry instead of just removing
> it?  Come to think of it, I am starting to wonder if the
> entries unpack-trees add to and drop from the index are hashed
> and unhashed correctly...

Heh. This was exactly what I was working on, and here's an early patch.

I say "early" just because I haven't really gone through it more than 
once, and testing that it passes all the tests.

It includes your "continue" fix, since that was actually related to the 
"remove_index_entry()" addition this needs. It also contains the hash next 
pointer initialization.

The bulk of the patch is moving (and renaming) the "remove_index_entry()" 
function (used to be remove_hash_entry, but it's very specific to the 
index, not to general hashes), and the comment that goes with it.

I also made that "copy a cache entry" thing use a function of its own, and 
made sure it saves/restores the hash pointer rather than using a naked 
"memcpy()" with offsetof etc. At least it's abstracted away, even if the 
function itself is ugly as sin.

Everything else is really just one-liners.

But I'm still looking at this. In particular, I want to add some 
assertions to make sure the index state matches the name hash state, but 
your lazy patch makes that less convenient.

		Linus

---
 builtin-read-tree.c |    3 ++-
 cache.h             |   21 +++++++++++++++++++++
 read-cache.c        |   19 +++----------------
 unpack-trees.c      |    2 +-
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index 5785401..7bdc312 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -41,11 +41,12 @@ static int read_cache_unmerged(void)
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce)) {
+			remove_index_entry(ce);
 			if (last && !strcmp(ce->name, last->name))
 				continue;
 			cache_tree_invalidate_path(active_cache_tree, ce->name);
 			last = ce;
-			ce->ce_flags |= CE_REMOVE;
+			continue;
 		}
 		*dst++ = ce;
 	}
diff --git a/cache.h b/cache.h
index e1000bc..32fa6da 100644
--- a/cache.h
+++ b/cache.h
@@ -135,6 +135,27 @@ struct cache_entry {
 #define CE_UPTODATE  (0x40000)
 #define CE_UNHASHED  (0x80000)
 
+static inline void copy_cache_entry(struct cache_entry *dst, struct cache_entry *src)
+{
+	struct cache_entry *next = dst->next;
+	memcpy(dst, src, offsetof(struct cache_entry, name));
+	dst->next = next;
+}
+
+/*
+ * We don't actually *remove* it, we can just mark it invalid so that
+ * we won't find it in lookups.
+ *
+ * Not only would we have to search the lists (simple enough), but
+ * we'd also have to rehash other hash buckets in case this makes the
+ * hash bucket empty (common). So it's much better to just mark
+ * it.
+ */
+static inline void remove_index_entry(struct cache_entry *ce)
+{
+	ce->ce_flags |= CE_UNHASHED;
+}
+
 static inline unsigned create_ce_flags(size_t len, unsigned stage)
 {
 	if (len >= CE_NAMEMASK)
diff --git a/read-cache.c b/read-cache.c
index e45f4b3..e8e2be8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -39,6 +39,7 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 	void **pos;
 	unsigned int hash = hash_name(ce->name, ce_namelen(ce));
 
+	ce->next = NULL;
 	pos = insert_hash(hash, ce, &istate->name_hash);
 	if (pos) {
 		ce->next = *pos;
@@ -64,26 +65,12 @@ static void set_index_entry(struct index_state *istate, int nr, struct cache_ent
 		hash_index_entry(istate, ce);
 }
 
-/*
- * We don't actually *remove* it, we can just mark it invalid so that
- * we won't find it in lookups.
- *
- * Not only would we have to search the lists (simple enough), but
- * we'd also have to rehash other hash buckets in case this makes the
- * hash bucket empty (common). So it's much better to just mark
- * it.
- */
-static void remove_hash_entry(struct index_state *istate, struct cache_entry *ce)
-{
-	ce->ce_flags |= CE_UNHASHED;
-}
-
 static void replace_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	struct cache_entry *old = istate->cache[nr];
 
 	if (ce != old) {
-		remove_hash_entry(istate, old);
+		remove_index_entry(old);
 		set_index_entry(istate, nr, ce);
 	}
 	istate->cache_changed = 1;
@@ -413,7 +400,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
 {
 	struct cache_entry *ce = istate->cache[pos];
 
-	remove_hash_entry(istate, ce);
+	remove_index_entry(ce);
 	istate->cache_changed = 1;
 	istate->cache_nr--;
 	if (pos >= istate->cache_nr)
diff --git a/unpack-trees.c b/unpack-trees.c
index ec558f9..07c4c28 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -590,7 +590,7 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
 		 * a match.
 		 */
 		if (same(old, merge)) {
-			memcpy(merge, old, offsetof(struct cache_entry, name));
+			copy_cache_entry(merge, old);
 		} else {
 			verify_uptodate(old, o);
 			invalidate_ce_path(old);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFH] CE_REMOVE conversion
  2008-02-21 17:31     ` Linus Torvalds
@ 2008-02-21 20:06       ` Linus Torvalds
  2008-02-21 20:08       ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2008-02-21 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Thu, 21 Feb 2008, Linus Torvalds wrote:
> 
> I say "early" just because I haven't really gone through it more than 
> once, and testing that it passes all the tests.

Ok, here's an improved version. This is a bit larger, but having now 
thought the problems through, I think this is *much* better.

The thing is, we used to not be able to handle the case of removing a 
index entry and then re-inserting it, and we even had a rather ugly 
special case for that reason, where replace_index_entry() basically turned 
itself into a no-op if the new and the old entries were the same.

But the merging code really wants to remove a set of entries and then 
replace them with a new one, and that whole logic needed the hash lists to 
be handled *properly*.

So what this patch does is to not just have the UNHASHED bit, but a HASHED 
bit too, and when you insert an entry into the name hash, that involves:

 - clear the UNHASHED bit, because now it's valid again for lookup (which 
   is really all that UNHASHED meant)
 - if we're being lazy, we're done here (but we still want to clear the 
   UNHASHED bit regardless of lazy mode, since we can become unlazy later, 
   and so we need the UNHASHED bit to always be set correctly, even if we 
   never actually insert the entry into the hash list)
 - if it was already hashed, we just leave it on the list
 - otherwise mark it HASHED and insert it into the list

this all means that unhashing and rehashing a name all just works 
automatically. Obviously, you cannot change the name of an entry (that 
would be a serious bug), but nothing can validly do that anyway (you'd 
have to allocate a new struct cache_entry anyway since the name length 
could change), so that's not a new limitation.

The code actually gets simpler in many ways, although the lazy hashing 
does mean that there are a few odd cases (ie something can be marked 
unhashed even though it was never on the hash in the first place, and 
isn't actually marked hashed!).

I haven't actually added any code to verify that the name hash state 
matches the index state, but I feel pretty good about this patch anyway: 
it passes my "it makes sense" filters in ways that the previous one 
didn't.

		Linus

---
 builtin-read-tree.c |    3 ++-
 cache.h             |   35 ++++++++++++++++++++++++++++++++++-
 read-cache.c        |   30 ++++++++++--------------------
 unpack-trees.c      |    2 +-
 4 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index 5785401..7bdc312 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -41,11 +41,12 @@ static int read_cache_unmerged(void)
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce)) {
+			remove_index_entry(ce);
 			if (last && !strcmp(ce->name, last->name))
 				continue;
 			cache_tree_invalidate_path(active_cache_tree, ce->name);
 			last = ce;
-			ce->ce_flags |= CE_REMOVE;
+			continue;
 		}
 		*dst++ = ce;
 	}
diff --git a/cache.h b/cache.h
index e1000bc..0ae33f4 100644
--- a/cache.h
+++ b/cache.h
@@ -133,7 +133,40 @@ struct cache_entry {
 #define CE_UPDATE    (0x10000)
 #define CE_REMOVE    (0x20000)
 #define CE_UPTODATE  (0x40000)
-#define CE_UNHASHED  (0x80000)
+
+#define CE_HASHED    (0x100000)
+#define CE_UNHASHED  (0x200000)
+
+/*
+ * Copy the sha1 and stat state of a cache entry from one to
+ * another. But we never change the name, or the hash state!
+ */
+#define CE_STATE_MASK (CE_HASHED | CE_UNHASHED)
+static inline void copy_cache_entry(struct cache_entry *dst, struct cache_entry *src)
+{
+	struct cache_entry *next = dst->next;
+	unsigned int state = dst->ce_flags & CE_STATE_MASK;
+
+	memcpy(dst, src, offsetof(struct cache_entry, name));
+
+	/* Restore the hash state */
+	dst->next = next;
+	dst->ce_flags = (dst->ce_flags & ~CE_STATE_MASK) | state;
+}
+
+/*
+ * We don't actually *remove* it, we can just mark it invalid so that
+ * we won't find it in lookups.
+ *
+ * Not only would we have to search the lists (simple enough), but
+ * we'd also have to rehash other hash buckets in case this makes the
+ * hash bucket empty (common). So it's much better to just mark
+ * it.
+ */
+static inline void remove_index_entry(struct cache_entry *ce)
+{
+	ce->ce_flags |= CE_UNHASHED;
+}
 
 static inline unsigned create_ce_flags(size_t len, unsigned stage)
 {
diff --git a/read-cache.c b/read-cache.c
index e45f4b3..fee0c80 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -37,8 +37,13 @@ static unsigned int hash_name(const char *name, int namelen)
 static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 {
 	void **pos;
-	unsigned int hash = hash_name(ce->name, ce_namelen(ce));
+	unsigned int hash;
 
+	if (ce->ce_flags & CE_HASHED)
+		return;
+	ce->ce_flags |= CE_HASHED;
+	ce->next = NULL;
+	hash = hash_name(ce->name, ce_namelen(ce));
 	pos = insert_hash(hash, ce, &istate->name_hash);
 	if (pos) {
 		ce->next = *pos;
@@ -59,33 +64,18 @@ static void lazy_init_name_hash(struct index_state *istate)
 
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
+	ce->ce_flags &= ~CE_UNHASHED;
 	istate->cache[nr] = ce;
 	if (istate->name_hash_initialized)
 		hash_index_entry(istate, ce);
 }
 
-/*
- * We don't actually *remove* it, we can just mark it invalid so that
- * we won't find it in lookups.
- *
- * Not only would we have to search the lists (simple enough), but
- * we'd also have to rehash other hash buckets in case this makes the
- * hash bucket empty (common). So it's much better to just mark
- * it.
- */
-static void remove_hash_entry(struct index_state *istate, struct cache_entry *ce)
-{
-	ce->ce_flags |= CE_UNHASHED;
-}
-
 static void replace_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	struct cache_entry *old = istate->cache[nr];
 
-	if (ce != old) {
-		remove_hash_entry(istate, old);
-		set_index_entry(istate, nr, ce);
-	}
+	remove_index_entry(old);
+	set_index_entry(istate, nr, ce);
 	istate->cache_changed = 1;
 }
 
@@ -413,7 +403,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
 {
 	struct cache_entry *ce = istate->cache[pos];
 
-	remove_hash_entry(istate, ce);
+	remove_index_entry(ce);
 	istate->cache_changed = 1;
 	istate->cache_nr--;
 	if (pos >= istate->cache_nr)
diff --git a/unpack-trees.c b/unpack-trees.c
index ec558f9..07c4c28 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -590,7 +590,7 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
 		 * a match.
 		 */
 		if (same(old, merge)) {
-			memcpy(merge, old, offsetof(struct cache_entry, name));
+			copy_cache_entry(merge, old);
 		} else {
 			verify_uptodate(old, o);
 			invalidate_ce_path(old);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFH] CE_REMOVE conversion
  2008-02-21 17:31     ` Linus Torvalds
  2008-02-21 20:06       ` Linus Torvalds
@ 2008-02-21 20:08       ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2008-02-21 20:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> It includes your "continue" fix, since that was actually related to the 
> "remove_index_entry()" addition this needs. It also contains the hash next 
> pointer initialization.

In the meantime, 'master' will get that single-liner "continue"
fix, which may conflict but the resolution is trivial.

> But I'm still looking at this. In particular, I want to add some 
> assertions to make sure the index state matches the name hash state, but 
> your lazy patch makes that less convenient.

Please feel free to revert it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-21 20:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-21  8:39 [RFH] CE_REMOVE conversion Junio C Hamano
2008-02-21  9:07 ` [RFH] index_name_exists() conversion Junio C Hamano
2008-02-21 16:15   ` Linus Torvalds
2008-02-21 16:29     ` Johannes Schindelin
2008-02-21 17:00     ` Linus Torvalds
2008-02-21 16:05 ` [RFH] CE_REMOVE conversion Linus Torvalds
2008-02-21 16:43   ` Junio C Hamano
2008-02-21 17:31     ` Linus Torvalds
2008-02-21 20:06       ` Linus Torvalds
2008-02-21 20:08       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).