git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: "Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	git@vger.kernel.org
Subject: Re: sparse fetch, was Re: [PATCH 08/12] git-clone: support --path to do sparse clone
Date: Fri, 25 Jul 2008 01:47:03 -0700	[thread overview]
Message-ID: <7v7ibauz94.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: 20080724182813.GA21186@sigill.intra.peff.net

Jeff King <peff@peff.net> writes:

> On Thu, Jul 24, 2008 at 06:41:03PM +0100, Johannes Schindelin wrote:
>
>> > As a user, I would expect "sparse clone" to also be sparse on the 
>> > fetching. That is, to not even bother fetching tree objects that we are 
>> > not going to check out. But that is a whole other can of worms from 
>> > local sparseness, so I think it is worth saving for a different series.
>> 
>> I think this is not even worth of a series.  Sure, it would have benefits 
>> for those who want sparse checkouts.  But it comes for a high price on 
>> everyone else:
>
> I agree there are a lot of issues. I am just thinking of the person who
> said they had a >100G repository. But I am also not volunteering to do
> it, so I will let somebody who really cares about it try to defend the
> idea.

I think sparse fetch is a lot worse than grafts and shallow clones which
are already bad.  These are all ways to introduce local inconsistency at
the object level and pretend everything is Ok, but the latter two do so
only at commit boundary and it is somewhat more manageable (but we still
do not handle it very well).  With sparse fetch, you cannot even guarantee
the integrity of individual commits with subtrees here and there missing.

I do think shallow checkout that says "I'll have the whole tree in the
index but the work tree will have only these paths checked out" makes
sense.  You do not need a fully populated work tree to create commits or
merges -- the only absolute minimum you need is a fully populated index.

In that sense, I think "protect index entries outside of these paths" (I
remember that the first round of this series was done around that notion)
is a wrong mentality to handle this.  We should think of this as more like
"you still populate the index with the whole tree, and you are free to
update them in any way you want, but we do not touch work tree outside
these areas".

This has a few ramifications:

 - If the user can somehow check out a path outside the "sparse" area, it
   is perfectly fine for the user to edit and "git add" it.  Such a method
   to check out a path outside the "sparse" area is a way to widen the
   "sparse" area the user originally set up;

 - When the user runs "merge", and it needs to present the user a working
   tree file because of conflicts at the file level, the user has to agree
   to widen the "sparse" area before being able to do so.  One way to do
   this is to refuse and fail the merge (and then the user needs to do
   that "unspecified way" of widening the "sparse" area first).  Another
   way would be to automatically widen the "sparse" area to include such
   conflicting paths.

 - And you would want to narrow it down after you do such a widening.

For many projects that has src/ and doc/ (git.git being one of them), it
is perfectly valid for a code person and a doc person to work in tandem.
In such a project, after the code person makes changes in her sparsely
checked out repository and making changes only to the src/ area and pushes
the results out, the doc person would run "git pull && git log -p
ORIG_HEAD" and updates the documentation in his sparsely checked out
repository that has only doc/ area.  The two parts are tied together and
they advance more or less in sync.  I think sparse checkout would be a
useful feature to help such a configuration.

Having said that, I however think that this can easily be misused as a CVS
style "one CVSROOT houses millions of totally unrelated projects" layout.
In CVS, the layout is perfectly fine because the system does not track
changes at anything higher than the level of individual files, but when
you naïvely map the layout to a system with tree-wide atomic commits, such
as git, it will defeat the whole point of using such a system.  The pace
these millions of unrelated projects advance do not have any relationship
with each other, but by tying them together in the same top-level tree,
the layout is introducing an unnecessary ordering between their commits.

  parent reply	other threads:[~2008-07-25  8:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-23 14:57 [PATCH 10/12] git-checkout: support --full and --path to manipulate sparse checkout Nguyễn Thái Ngọc Duy
2008-07-23 14:57 ` [PATCH 08/12] git-clone: support --path to do sparse clone Nguyễn Thái Ngọc Duy
2008-07-23 14:56   ` [PATCH 03/12] Introduce sparse prefix Nguyễn Thái Ngọc Duy
2008-07-23 14:55     ` [PATCH 02/12] git-grep: support --no-external-grep Nguyễn Thái Ngọc Duy
2008-07-23 19:01       ` Petr Baudis
2008-07-23 19:05         ` Petr Baudis
2008-07-24 20:26       ` Alex Riesen
2008-07-24 23:16         ` Nguyen Thai Ngoc Duy
2008-07-24 17:19   ` [PATCH 08/12] git-clone: support --path to do sparse clone Jeff King
2008-07-24 17:41     ` sparse fetch, was " Johannes Schindelin
2008-07-24 18:28       ` Jeff King
2008-07-25  0:09         ` Johannes Schindelin
2008-07-25  0:46         ` James Pickens
2008-07-25  0:49           ` sparse fetch, was Re: [PATCH 08/12] git-clone: support --path?to " Jeff King
2008-07-25  8:47         ` Junio C Hamano [this message]
2008-07-25  8:54           ` sparse fetch, was Re: [PATCH 08/12] git-clone: support --path to " Sverre Rabbelier
2008-07-24 18:44       ` Nguyen Thai Ngoc Duy
2008-07-24 18:53       ` Petr Baudis
2008-07-24 19:01         ` Sverre Rabbelier
2008-07-25  0:12           ` Johannes Schindelin
2008-07-25  0:42             ` Petr Baudis
2008-07-25  8:14             ` Sverre Rabbelier
2008-07-24 18:47     ` Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v7ibauz94.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).