git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Vitaly Arbuzov <vit@uber.com>, Philip Oakley <philipoakley@iee.org>
Cc: Git List <git@vger.kernel.org>
Subject: Re: How hard would it be to implement sparse fetching/pulling?
Date: Fri, 1 Dec 2017 09:30:31 -0500	[thread overview]
Message-ID: <bac032c8-b9c2-4520-58e5-d518f4efd9d6@jeffhostetler.com> (raw)
In-Reply-To: <CANxXvsNuEmo+uaRY8t44csqzXAk3rS+D9E=LMvaLcZeg-aLvRw@mail.gmail.com>



On 11/30/2017 8:51 PM, Vitaly Arbuzov wrote:
> I think it would be great if we high level agree on desired user
> experience, so let me put a few possible use cases here.
> 
> 1. Init and fetch into a new repo with a sparse list.
> Preconditions: origin blah exists and has a lot of folders inside of
> src including "bar".
> Actions:
> git init foo && cd foo
> git config core.sparseAll true # New flag to activate all sparse
> operations by default so you don't need to pass options to each
> command.
> echo "src/bar" > .git/info/sparse-checkout
> git remote add origin blah
> git pull origin master
> Expected results: foo contains src/bar folder and nothing else,
> objects that are unrelated to this tree are not fetched.
> Notes: This should work same when fetch/merge/checkout operations are
> used in the right order.

With the current patches (parts 1,2,3) we can pass a blob-ish
to the server during a clone that refers to a sparse-checkout
specification.  There's a bit of a chicken-n-egg problem getting
things set up.  So if we assume your team would create a series
of "known enlistments" under version control, then you could
just reference one by <branch>:<path> during your clone.  The
server can lookup that blob and just use it.

     git clone --filter=sparse:oid=master:templates/bar URL

And then the server will filter-out the unwanted blobs during
the clone.  (The current version only filters blobs; you still
get full commits and trees.  That will be revisited later.)

On the client side, the partial clone installs local config
settings into the repo so that subsequent fetches default to
the same filter criteria as used in the clone.


I don't currently have provision to send a full sparse-checkout
specification to the server during a clone or fetch.  That
seemed like too much to try to squeeze into the protocols.
We can revisit this later if there is interest, but it wasn't
critical for the initial phase.


> 
> 2. Add a file and push changes.
> Preconditions: all steps above followed.
> touch src/bar/baz.txt && git add -A && git commit -m "added a file"
> git push origin master
> Expected results: changes are pushed to remote.

I don't believe partial clone and/or partial fetch will cause
any changes for push.


> 
> 3. Clone a repo with a sparse list as a filter.
> Preconditions: same as for #1
> Actions:
> echo "src/bar" > /tmp/blah-sparse-checkout
> git clone --sparse /tmp/blah-sparse-checkout blah # Clone should be
> the only command that would requires specific option key being passed.
> Expected results: same as for #1 plus /tmp/blah-sparse-checkout is
> copied into .git/info/sparse-checkout

There are 2 independent concepts here: clone and checkout.
Currently, there isn't any automatic linkage of the partial clone to
the sparse-checkout settings, so you could do something like this:

     git clone --no-checkout --filter=sparse:oid=master:templates/bar URL
     git cat-file ... templates/bar >.git/info/sparse-checkout
     git config core.sparsecheckout true
     git checkout ...

I've been focused on the clone/fetch issues and have not looked
into the automation to couple them.


> 
> 4. Showing log for sparsely cloned repo.
> Preconditions: #3 is followed
> Actions:
> git log
> Expected results: recent changes that affect src/bar tree.

If I understand your meaning, log would only show changes
within the sparse subset of the tree.  This is not on my
radar for partial clone/fetch.  It would be a nice feature
to have, but I think it would be better to think about it
from the point of view of sparse-checkout rather than clone.


> 
> 5. Showing diff.
> Preconditions: #3 is followed
> Actions:
> git diff HEAD^ HEAD
> Expected results: changes from the most recent commit affecting
> src/bar folder are shown.
> Notes: this can be tricky operation as filtering must be done to
> remove results from unrelated subtrees.

I don't have any plan for this and I don't think it fits within
the scope of clone/fetch.  I think this too would be a sparse-checkout
feature.


> 
> *Note that I intentionally didn't mention use cases that are related
> to filtering by blob size as I think we should logically consider them
> as a separate, although related, feature.

I've grouped blob-size and sparse filter together for the
purposes of clone/fetch since the basic mechanisms (filtering,
transport, and missing object handling) are the same for both.
They do lead to different end-uses, but that is above my level
here.


> 
> What do you think about these examples above? Is that something that
> more-or-less fits into current development? Are there other important
> flows that I've missed?

These are all good ideas and it is good to have someone else who
wants to use partial+sparse thinking about it and looking for gaps
as we try to make a complete end-to-end feature.
> 
> -Vitaly

Thanks
Jeff


  parent reply	other threads:[~2017-12-01 14:30 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30  3:16 How hard would it be to implement sparse fetching/pulling? Vitaly Arbuzov
2017-11-30 14:24 ` Jeff Hostetler
2017-11-30 17:01   ` Vitaly Arbuzov
2017-11-30 17:44     ` Vitaly Arbuzov
2017-11-30 20:03       ` Jonathan Nieder
2017-12-01 16:03         ` Jeff Hostetler
2017-12-01 18:16           ` Jonathan Nieder
2017-11-30 23:43       ` Philip Oakley
2017-12-01  1:27         ` Vitaly Arbuzov
2017-12-01  1:51           ` Vitaly Arbuzov
2017-12-01  2:51             ` Jonathan Nieder
2017-12-01  3:37               ` Vitaly Arbuzov
2017-12-02 16:59               ` Philip Oakley
2017-12-01 14:30             ` Jeff Hostetler [this message]
2017-12-02 16:30               ` Philip Oakley
2017-12-04 15:36                 ` Jeff Hostetler
2017-12-05 23:46                   ` Philip Oakley
2017-12-02 15:04           ` Philip Oakley
2017-12-01 17:23         ` Jeff Hostetler
2017-12-01 18:24           ` Jonathan Nieder
2017-12-04 15:53             ` Jeff Hostetler
2017-12-02 18:24           ` Philip Oakley
2017-12-05 19:14             ` Jeff Hostetler
2017-12-05 20:07               ` Jonathan Nieder
2017-12-01 15:28       ` Jeff Hostetler
2017-12-01 14:50     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bac032c8-b9c2-4520-58e5-d518f4efd9d6@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=philipoakley@iee.org \
    --cc=vit@uber.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).