Git development
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Nicolas Pitre <nico@cam.org>,
	Miklos Vajna <vmiklos@frugalware.org>,
	Rohan Dhruva <rohandhruva@gmail.com>,
	git@vger.kernel.org
Subject: Re: GSoC 2009 Prospective student
Date: Mon, 23 Feb 2009 07:58:36 -0800	[thread overview]
Message-ID: <20090223155836.GI22848@spearce.org> (raw)
In-Reply-To: <m3y6vxupvf.fsf@localhost.localdomain>

Jakub Narebski <jnareb@gmail.com> wrote:
> Nicolas Pitre <nico@cam.org> writes:
> > On Sun, 22 Feb 2009, Miklos Vajna wrote: 
> > > 
> > > http://thread.gmane.org/gmane.comp.version-control.git/55254/focus=55298
> > > 
> > > Especially Shawn's message, which can be a base for your proposal, if
> > > you want to work in this.
> > 
> > I don't particularly agree with Shawn's proposal.  Reliance on a stable 
> > sorting on the server side is too fragile, restrictive and cumbersome.

We already rely on a stable sort in the tree format.  Asking that
a stable sort be applied when a clone is started so that we can
later resume it isn't unreasonable.  Hell, that tree format sort
is a B***H anyway, its not a simple sort by memcmp().  Almost every
Git re-implementation gets it wrong the first time out.
 
> > Restartable clone is _hard_.  Even I who has quite a bit of knowledge in 
> > the affected area didn't find a satisfactory solution yet.

Sure, its difficult, but nobody has put effort into it either.
I think it could be done by enforcing a stable sort during clone
(and perhaps only during clone).  That's the basis of that message
Miklos points to.  Though I don't think I ever said anything about
the stable sort only being used during clone.

> I think it is possible for dumb protocols (using commit walkers) and
> for (deprecated) rsync.

Yes, it is possible for the commit walkers to implement a restart,
as they are actually beginning at the current root and walking back
in history.  Resuming a large file like a pack is easy to do on HTTP
if the remote server supports byte range serving.  Its also easy
to validate on the client that the pack wasn't repacked during the
idle period (between initial fetch and restart), just validate the
SHA-1 footer.  If the pack was repacked and came up with the same
name you'll have a mismatch on the footer.  Discard and try again.

And if you want to save bandwidth, always grab the last 20 bytes
of the file before getting any other parts, save it somewhere,
and revalidate that last 20 before resuming.  If its changed,
you should discard what you have and start over from the beginning.

> > I think restartable clone is a really bad suggestion for SOC students.  
> > After all we want successful SOC projects, not ones that even core git 
> > developers did not yet find a good solution for.
> > 
> > IMHO of course.
> 
> But I agree that within current limits (as far as I know there are no
> way to ask for SHA-1; you can only ask for refs for security reasons)
> it would be difficult to very difficult to add restartable clone
> support to native (smart) protocols.
> 
> If not for this limitation it would be, I think, possible to do a kind
> of fsck, checking which commits in packfile are complete (i.e. have
> all objects), and based on that ask for subset of objects.  This would
> require support only from a client... alas, this is not possible.

I think the current "must want advertised ref" restriction is
too strict.  If you make the server check the reachability of the
wanted object, (assuming it can be resolved to a commit) then you
can pick up in the middle of history.  We already (to some extent)
support that with the deepen thing in a shallow clone.  Sure, it
may cause more server load when clients ask for this partial fetch.

But clients can already abuse a server far more by repeatedly doing
a clone, and then break the network connection as soon as the PACK
header comes down the wire.  The server just spent a lot of CPU
and IO time building the complete list of the objects to transmit.
Its really a non-trivial load on the server side.  And by having
the client break the pipe at the 'PACK' header, the client doesn't
have to absorb the large data transfer either.  Making it fairly
easy to DOS a Git daemon with a small botnet.

So, IMHO, the restriction that a commit must be advertised, and not
merely reachable, is overly strict and doesn't buy us a whole lot.
 
> I think that unless 'restartable clone' is limited to commit wakers
> (HTP protocol etc.) it should be moved up the diffuculty from "New to
> Git?" section. I guess that mirror-sync, formerly GitTorrent, could be
> easier to implement.

Maybe.  But a simple stable sort on the objects makes it easier,
perhaps within reach of "new to git".

That ideas page is a wiki for a reason.  If folks feel differently
from me, please edit it to improve things!  :-)

-- 
Shawn.

  reply	other threads:[~2009-02-23 16:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-22 19:58 GSoC 2009 Prospective student Rohan Dhruva
2009-02-22 20:07 ` Sverre Rabbelier
2009-02-22 20:29   ` Rohan Dhruva
2009-02-22 20:38     ` Sverre Rabbelier
2009-02-22 20:43 ` Miklos Vajna
2009-02-22 22:22   ` Nicolas Pitre
2009-02-23  0:46     ` Sitaram Chamarty
2009-02-23 15:37     ` Jakub Narebski
2009-02-23 15:58       ` Shawn O. Pearce [this message]
2009-02-23 16:31         ` Nicolas Pitre
2009-02-24 15:38         ` Jakub Narebski
2009-02-24 15:55           ` Shawn O. Pearce
2009-02-24 21:08             ` Jakub Narebski
2009-02-24 21:17               ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090223155836.GI22848@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=nico@cam.org \
    --cc=rohandhruva@gmail.com \
    --cc=vmiklos@frugalware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox