All of lore.kernel.org
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org, Duy Nguyen <pclouds@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Paul-Sebastian Ungureanu" <ungureanupaulsebastian@gmail.com>
Subject: Re: [PATCH v2 4/5] split-index: don't compare stat data of entries already marked for split index
Date: Thu, 27 Sep 2018 15:43:24 +0200	[thread overview]
Message-ID: <20180927134324.GI27036@localhost> (raw)
In-Reply-To: <20180927124434.30835-5-szeder.dev@gmail.com>

On Thu, Sep 27, 2018 at 02:44:33PM +0200, SZEDER Gábor wrote:
>  split-index.c | 79 ++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 62 insertions(+), 17 deletions(-)

I generated this patch with more context lines than usual, so the two
conditions that I didn't add any comments to in this or in the next
patch are fully visible.

> diff --git a/split-index.c b/split-index.c
> index 548272ec33..7d8799f6b7 100644
> --- a/split-index.c
> +++ b/split-index.c
> @@ -204,19 +204,34 @@ void prepare_to_write_split_index(struct index_state *istate)
>  		 * that are not marked with either CE_MATCHED or
>  		 * CE_UPDATE_IN_BASE. If istate->cache[i] is a
>  		 * duplicate, deduplicate it.
>  		 */
>  		for (i = 0; i < istate->cache_nr; i++) {
>  			struct cache_entry *base;
> -			/* namelen is checked separately */
> -			const unsigned int ondisk_flags =
> -				CE_STAGEMASK | CE_VALID | CE_EXTENDED_FLAGS;
> -			unsigned int ce_flags, base_flags, ret;
>  			ce = istate->cache[i];
> -			if (!ce->index)
> +			if (!ce->index) {
> +				/*
> +				 * During simple update index operations this
> +				 * is a cache entry that is not present in
> +				 * the shared index.  It will be added to the
> +				 * split index.
> +				 *
> +				 * However, it might also represent a file
> +				 * that already has a cache entry in the
> +				 * shared index, but a new index has just
> +				 * been constructed by unpack_trees(), and
> +				 * this entry now refers to different content
> +				 * than what was recorded in the original
> +				 * index, e.g. during 'read-tree -m HEAD^' or
> +				 * 'checkout HEAD^'.  In this case the
> +				 * original entry in the shared index will be
> +				 * marked as deleted, and this entry will be
> +				 * added to the split index.
> +				 */
>  				continue;
> +			}
>  			if (ce->index > si->base->cache_nr) {
>  				ce->index = 0;
>  				continue;
>  			}

This condition in the context above checks whether a cache entry
refers to a non-existing entry in the shared index.

I don't understand the role of this condition, for two reasons:

  - Under what circumstances can this condition be ever fulfilled?

    I instrumented it and run the test suite repeatedly with
    'GIT_TEST_SPLIT_INDEX=yes', but it has never been fulfilled.  I
    also tried to come up with all kinds of elaborate scenarios to
    trigger it, but no joy, and code inspection didn't bring anything
    either.

  - There are similar conditions in 'split-index.c' in the functions
    mark_entry_for_delete() and replace_entry(); here is the one from
    the latter, but they only differ in the error message:

      if (pos >= istate->cache_nr)
          die("position for replacement %d exceeds base index size %d",
              (int)pos, istate->cache_nr);

    (Note that this 'istate->cache_nr' here equals
    to 'si->base->cache_nr'; see their caller merge_base_index().)

    The die() clearly indicates that fulfilling this condition is a
    Bad Thing.  These two functions are invoked to create a unified
    view of the just read split and shared indexes, so the fulfillment
    of this condition could indicate a corrupt index file, and
    die()ing right away seems to be justified.

    Then why doesn't the condition in prepare_to_write_split_index()
    die() as well?!  After all if it were fulfilled, then it would
    indicate a corruption in the current index_state, and writing a
    new split index from corrupt data doesn't seem like a particularly
    good idea.


>  			ce->ce_flags |= CE_MATCHED; /* or "shared" */
>  			base = si->base->cache[ce->index - 1];
> @@ -224,24 +239,54 @@ void prepare_to_write_split_index(struct index_state *istate)
>  				continue;
>  			if (ce->ce_namelen != base->ce_namelen ||
>  			    strcmp(ce->name, base->name)) {
>  				ce->index = 0;
>  				continue;
>  			}

I don't understand the role of this condition either, and just like
the one discussed above, the test suite with
'GIT_TEST_SPLIT_INDEX=yes' seems to never fulfill it.

> -			ce_flags = ce->ce_flags;
> -			base_flags = base->ce_flags;
> -			/* only on-disk flags matter */
> -			ce->ce_flags   &= ondisk_flags;
> -			base->ce_flags &= ondisk_flags;
> -			ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
> -				     offsetof(struct cache_entry, name) -
> -				     offsetof(struct cache_entry, ce_stat_data));
> -			ce->ce_flags = ce_flags;
> -			base->ce_flags = base_flags;
> -			if (ret)
> -				ce->ce_flags |= CE_UPDATE_IN_BASE;
> +			/*
> +			 * This is the copy of a cache entry that is present
> +			 * in the shared index, created by unpack_trees()
> +			 * while it constructed a new index.
> +			 */
> +			if (ce->ce_flags & CE_UPDATE_IN_BASE) {
> +				/*
> +				 * Already marked for inclusion in the split
> +				 * index, either because the corresponding
> +				 * file was modified and the cached stat data
> +				 * was refreshed, or because the original
> +				 * entry already had a replacement entry in
> +				 * the split index.
> +				 * Nothing to do.
> +				 */
> +			} else {
> +				/*
> +				 * Thoroughly compare the cached data to see
> +				 * whether it should be marked for inclusion
> +				 * in the split index.
> +				 *
> +				 * This comparison might be unnecessary, as
> +				 * code paths modifying the cached data do
> +				 * set CE_UPDATE_IN_BASE as well.
> +				 */
> +				const unsigned int ondisk_flags =
> +					CE_STAGEMASK | CE_VALID |
> +					CE_EXTENDED_FLAGS;
> +				unsigned int ce_flags, base_flags, ret;
> +				ce_flags = ce->ce_flags;
> +				base_flags = base->ce_flags;
> +				/* only on-disk flags matter */
> +				ce->ce_flags   &= ondisk_flags;
> +				base->ce_flags &= ondisk_flags;
> +				ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
> +					     offsetof(struct cache_entry, name) -
> +					     offsetof(struct cache_entry, ce_stat_data));
> +				ce->ce_flags = ce_flags;
> +				base->ce_flags = base_flags;
> +				if (ret)
> +					ce->ce_flags |= CE_UPDATE_IN_BASE;
> +			}
>  			discard_cache_entry(base);
>  			si->base->cache[ce->index - 1] = ce;
>  		}
>  		for (i = 0; i < si->base->cache_nr; i++) {
>  			ce = si->base->cache[i];
>  			if ((ce->ce_flags & CE_REMOVE) ||
> -- 
> 2.19.0.361.gafc87ffe72
> 

  reply	other threads:[~2018-09-27 13:43 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-27 12:44 [PATCH v2 0/5] Fix the racy split index problem SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 1/5] split-index: add tests to demonstrate " SZEDER Gábor
2018-09-28  0:48   ` SZEDER Gábor
2018-09-28  2:40     ` SZEDER Gábor
2018-09-28 17:30     ` Junio C Hamano
2018-09-27 12:44 ` [PATCH v2 2/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 3/5] split-index: count the number of deleted entries SZEDER Gábor
2018-09-27 12:44 ` [PATCH v2 4/5] split-index: don't compare stat data of entries already marked for split index SZEDER Gábor
2018-09-27 13:43   ` SZEDER Gábor [this message]
2018-09-27 12:44 ` [PATCH v2 5/5] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-09-27 13:53 ` [PATCH v2 0/5] Fix the racy split index problem Ævar Arnfjörð Bjarmason
2018-09-27 14:23   ` SZEDER Gábor
2018-09-27 15:25     ` Ævar Arnfjörð Bjarmason
2018-09-28  6:57       ` Ævar Arnfjörð Bjarmason
2018-09-28 10:17         ` SZEDER Gábor
2018-10-08 14:54         ` Ævar Arnfjörð Bjarmason
2018-10-08 15:41           ` SZEDER Gábor
2018-09-28 16:24 ` [PATCH v3 0/6] " SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 1/6] t1700-split-index: document why FSMONITOR is disabled in this test script SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 2/6] split-index: add tests to demonstrate the racy split index problem SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 3/6] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 4/6] split-index: count the number of deleted entries SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 5/6] split-index: don't compare stat data of entries already marked for split index SZEDER Gábor
2018-09-29  5:36     ` Duy Nguyen
2018-09-29  9:14       ` SZEDER Gábor
2018-09-29 10:07         ` SZEDER Gábor
2018-09-28 16:24   ` [PATCH v3 6/6] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-09-29  5:21     ` Duy Nguyen
2018-09-29  7:57       ` SZEDER Gábor
2018-09-30 14:47   ` [PATCH v3 0/6] Fix the racy split index problem SZEDER Gábor
2018-10-05  6:15     ` Junio C Hamano
2018-10-11  9:43   ` [PATCH v4 " SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 1/6] t1700-split-index: document why FSMONITOR is disabled in this test script SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 2/6] split-index: add tests to demonstrate the racy split index problem SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 3/6] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 4/6] split-index: count the number of deleted entries SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 5/6] split-index: don't compare cached data of entries already marked for split index SZEDER Gábor
2018-10-11  9:43     ` [PATCH v4 6/6] split-index: smudge and add racily clean cache entries to " SZEDER Gábor
2018-10-11  9:53     ` [PATCH 7/6] split-index: BUG() when cache entry refers to non-existing shared entry SZEDER Gábor
2018-10-11 10:36     ` [PATCH v4 0/6] Fix the racy split index problem Ævar Arnfjörð Bjarmason
2018-10-11 11:38       ` SZEDER Gábor
2018-10-12  3:20       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180927134324.GI27036@localhost \
    --to=szeder.dev@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=t.gummerer@gmail.com \
    --cc=ungureanupaulsebastian@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.