From: Jakub Narebski <jnareb@gmail.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: git@vger.kernel.org
Subject: Re: Errors cloning large repo
Date: Mon, 12 Mar 2007 12:09:33 +0100 [thread overview]
Message-ID: <200703121209.35052.jnareb@gmail.com> (raw)
In-Reply-To: <20070311020013.GD10343@spearce.org>
Shawn O. Pearce wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
>> Shawn O. Pearce wrote:
>>
>>> One thing that you could do is segment the repository into multiple
>>> packfiles yourself, and then clone using rsync or http, rather than
>>> using the native Git protocol.
>>
>> By the way, it would be nice to have talked about fetch / clone
>> support for sending (and creating) _multiple_ pack files. Beside
>> the situation where we must use more than one packfile because
>> of size limits, it would also help clone as it could send existing
>> packs and pack only loose objects (trading perhaps some bandwidth
>> with CPU load on the server; think kernel.org).
>
> I've thought about adding that type of protocol extension on
> more than one occasion, but have now convinced myself that it is
> completely unnecessary. Well at least until a project has more
> than 2^32-1 objects anyway.
>
> The reason is we can send any size packfile over the network; there
> is no index sent so there is no limit on how much data we transfer.
> We could easily just dump all existing packfiles as-is (just clip
> the header/footers and generate our own for the entire stream)
> and then send the loose objects on the end.
But what would happen if server supporting concatenated packfiles
sends such stream to the old client? So I think some kind of protocol
extension, or at least new request / new feature is needed for that.
Wouldn't it be better to pack loose objects into separate pack
(and perhaps save it, if some threshold is crossed, and we have
writing rights to repo), by the way?
> The client could easily segment that into multiple packfiles
> locally using two rules:
>
> - if the last object was not a OBJ_COMMIT and this object is
> an OBJ_COMMIT, start a new packfile with this object.
>
> - if adding this object to the current packfile exceeds my local
> filesize threshold, start a new packfile.
>
> The first rule works because we sort objects by type, and commits
> appear at the front of a packfile. So if you see a non-commit
> followed by a commit, that's the packfile boundary that the
> server had.
>
> The second rule is just common sense. But I'm not sure the first
> rule is even worthwhile; the server's packfile boundaries have no
> real interest for the client.
Without first rule, wouldn't client end with strange packfile?
Or would it have to rewrite a pack?
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2007-03-12 11:07 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-10 2:37 Errors cloning large repo Anton Tropashko
2007-03-10 3:07 ` Shawn O. Pearce
2007-03-10 5:54 ` Linus Torvalds
2007-03-10 6:01 ` Shawn O. Pearce
2007-03-10 22:32 ` Martin Waitz
2007-03-10 22:46 ` Linus Torvalds
2007-03-11 21:35 ` Martin Waitz
2007-03-10 10:27 ` Jakub Narebski
2007-03-11 2:00 ` Shawn O. Pearce
2007-03-12 11:09 ` Jakub Narebski [this message]
2007-03-12 14:24 ` Shawn O. Pearce
2007-03-17 13:23 ` Jakub Narebski
[not found] ` <82B0999F-73E8-494E-8D66-FEEEDA25FB91@adacore.com>
2007-03-10 22:21 ` Linus Torvalds
2007-03-10 5:10 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2007-03-13 0:02 Anton Tropashko
2007-03-12 17:39 Anton Tropashko
2007-03-12 18:40 ` Linus Torvalds
2007-03-10 1:21 Anton Tropashko
2007-03-10 1:45 ` Linus Torvalds
2007-03-09 23:48 Anton Tropashko
2007-03-10 0:54 ` Linus Torvalds
2007-03-10 2:03 ` Linus Torvalds
2007-03-10 2:12 ` Junio C Hamano
2007-03-09 19:20 Anton Tropashko
2007-03-09 21:37 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200703121209.35052.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).