git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@inf.ethz.ch>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	git@vger.kernel.org, mhagger@alum.mit.edu,
	robin.rosenberg@dewire.com
Subject: Re: [PATCH/RFC v2 0/16] Introduce index file format version 5
Date: Wed, 8 Aug 2012 11:07:21 +0200	[thread overview]
Message-ID: <87628tsqeu.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <7vboimuvur.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Tue, 7 Aug 2012 16:26:52 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Rast <trast@student.ethz.ch> writes:
>
>> I like the general idea, too, but I think there is a long way ahead, and
>> we shouldn't hold up v5 on this.
>
> We shouldn't rush, only to keep some deadline, and regret it later
> that we butchered the index format without thinking things through.
> When this was added to the GSoC idea page, I already said upfront
> that this was way too big a topic to be a GSoC project, didn't I?

Let me spell out my concern.  There are two v5s here:

* The extent of the GSoC task.

* The eventual implementation of index-v5 that goes into Git mainline.

IMHO this thread is mixing up the two.  There indeed must not be any
rush in the final implementation of index-v5.  However, the GSoC ends in
less than two weeks, and I have to evaluate Thomas on whatever is
finished until then.

AFAIK Thomas is now cleaning up the existing code to be in readable
shape, using your feedback, which is great.  However, the above
suggestion is such a fuzzily-specified task that there is no way to even
find out what needs to be done within the next two weeks.  Perhaps it
makes sense, at this point, to wrap anything that ended up having _v[25]
suffixes in an index_ops like Duy did.  That's a long way from actually
following through on the idea, though.

> [...] "The new on-disk format is different from
> the current one, and as it is different from the current one, we can
> easily enhance it even more by hooking anything interesting to it!"
> does not sound like a valid argument.  
>
>> For example, for v5 it
>> would be far better if conflicted and resolve-undo entries were a
>> property of the normal index entry, instead of something that so happens
>> to be consecutive entries and in a completely different place,
>> respectively.
>
> I am not sure I am convinced.  Conflicts are already expressed by an
> attribute on a normal index entry (it is called "stage"), and
> because we check for "is the index fully merged" fairly often, it
> makes sense to have it in each entry.  Actually having an unmerged
> entry is a rare event (happens only during a mergy operation that
> gave control back to you), so we do not lose much by expressing them
> as consecutive entries.  Resolve-undo is far less often used, and is
> not an essential feature, so it makes perfect sense to have it as an
> optional index extension to allow versions of Git that are unaware
> of it to still use an index file that has it.

I picked this example because in the big picture, the current code goes
to silly contortions to shuffle data around.  Conflicts and resolve-undo
entries are two faces of the same coin, but the code does not express
this at all.  Whenever the user resolves a conflict, it removes the
existing index entries (consecutive in a flat table) and inserts them in
the resolve-undo tree (tree-shaped where every entry has all stages
embedded).  When using 'checkout -m' to recover the conflict, it goes
the other way.

v5 would simplify this: the difference between a conflict and a
resolve-undo entry is only one bit.  But because it needs to maintain v2
compatiblity, it first untangles the mixed conflict/resolve-undo data
and puts them in the right format, then later reassembles them.

So "v5 could do it faster if all the code were written for it" is only
half of it.  v5's data layout would also result in simpler data flow,
but as long as it is not allowed to exploit this, it's actually *more*
layers of complexity.

I think the part you snipped

>> the loops that iterate over the index [...] either
>> skip unmerged entries or specifically look for them.  There are subtle
>> differences between the loops on many points: what do they do when they
>> hit an unmerged entry?  Or a CE_REMOVED or CE_VALID one?

is a symptom of the same general problem: the data structures are sound,
but they are leaking all over the code and now we have lots of
complexity to do even simple operations like "for each unmerged entry".

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2012-08-08  9:07 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-05 21:48 [PATCH/RFC v2 0/16] Introduce index file format version 5 Thomas Gummerer
2012-08-05 21:48 ` [PATCH/RFC v2 01/16] Modify cache_header to prepare for other index formats Thomas Gummerer
2012-08-06  1:17   ` Junio C Hamano
2012-08-07 12:41     ` Thomas Gummerer
2012-08-07 15:45       ` Junio C Hamano
2012-08-05 21:48 ` [PATCH/RFC v2 02/16] Modify read functions " Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 03/16] Modify match_stat_basic " Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 04/16] Modify write functions " Thomas Gummerer
2012-08-06  1:34   ` Junio C Hamano
2012-08-07 12:50     ` Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 05/16] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2012-08-06  1:36   ` Junio C Hamano
2012-08-05 21:49 ` [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code Thomas Gummerer
2012-08-06  1:43   ` Junio C Hamano
2012-08-07 16:59     ` Thomas Gummerer
2012-08-08 20:16       ` Junio C Hamano
2012-08-08 20:57         ` Junio C Hamano
2012-08-09 13:19           ` Thomas Gummerer
2012-08-09 16:51             ` Junio C Hamano
2012-08-09 22:51               ` Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 07/16] Add documentation of the index-v5 file format Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 08/16] Make in-memory format aware of stat_crc Thomas Gummerer
2012-08-06  1:46   ` Junio C Hamano
2012-08-07 19:02     ` Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 09/16] Read index-v5 Thomas Gummerer
2012-08-06  5:17   ` Junio C Hamano
2012-08-08  7:41     ` Thomas Gummerer
2012-08-08 16:49       ` Junio C Hamano
2012-08-08 20:44         ` Thomas Gummerer
2012-08-08 21:50           ` Junio C Hamano
2012-08-05 21:49 ` [PATCH/RFC v2 10/16] Read resolve-undo data Thomas Gummerer
2012-08-06  1:51   ` Junio C Hamano
2012-08-07 19:17     ` Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 11/16] Read cache-tree in index-v5 Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 12/16] Write index-v5 Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 13/16] Write index-v5 cache-tree data Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 14/16] Write resolve-undo data for index-v5 Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 15/16] update-index.c: add a force-rewrite option Thomas Gummerer
2012-08-06  1:58   ` Junio C Hamano
2012-08-08  7:31     ` Thomas Gummerer
2012-08-05 21:49 ` [PATCH/RFC v2 16/16] p0002-index.sh: add perf test for the index formats Thomas Gummerer
2012-08-06 14:35 ` [PATCH/RFC v2 0/16] Introduce index file format version 5 Nguyễn Thái Ngọc Duy
2012-08-06 14:35   ` [PATCH 1/2] Move index v2 specific code out of read-cache Nguyễn Thái Ngọc Duy
2012-08-06 14:36   ` [PATCH 2/2] Add index-v5 Nguyễn Thái Ngọc Duy
2012-08-07 21:52     ` Robin Rosenberg
2012-08-08 10:54       ` Thomas Gummerer
2012-08-06 15:51   ` [PATCH/RFC v2 0/16] Introduce index file format version 5 Junio C Hamano
2012-08-06 16:06     ` Thomas Gummerer
2012-08-06 17:46   ` Junio C Hamano
2012-08-07 12:16     ` Nguyen Thai Ngoc Duy
2012-08-08  1:38       ` Junio C Hamano
2012-08-08 13:54         ` Nguyen Thai Ngoc Duy
2012-08-08 16:31           ` Junio C Hamano
2012-08-09  2:28             ` Nguyen Thai Ngoc Duy
2012-08-07 22:31     ` Thomas Rast
2012-08-07 23:26       ` Junio C Hamano
2012-08-08  9:07         ` Thomas Rast [this message]
2012-08-08 22:47           ` Junio C Hamano
2012-08-08 10:30       ` Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87628tsqeu.fsf@thomas.inf.ethz.ch \
    --to=trast@inf.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).