From: Shawn Pearce <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: Re: [PATCH] pack-objects: re-validate data we copy from elsewhere.
Date: Mon, 4 Sep 2006 02:44:43 -0400 [thread overview]
Message-ID: <20060904064443.GB30032@spearce.org> (raw)
In-Reply-To: <7v3bb8qixi.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> wrote:
> Now if we fix dumb transport downloaders, then we could even
> make a convention that the packs named pack-[0-9a-f]{40}.pack
> are archive packs. And git-repack can even have a convention
> that .git/objects/pack/pack-active.(pack|idx) is the active
> pack.
Seems reasonable.
I take it you are proposing that a dumb transport always downloads
pack-active.pack as pack-n{40}.pack where the dumb protocol
downloader computed the correct pack name from its contents. Thus
any remote pack downloaded over a dumb transport is automatically
treated as a historical pack by the receiving repository.
This will cause someone tracking a remote repository over a dumb
transport to need to repack only a subset of their historical packs
frequently into their own active.pack while leaving other historical
packs untouched.
But the more that I think about this neither solution (an active
pack symref or pack-active.pack) really solves this. Being limited
to just one active pack seems to be a problem with at least the
dumb transports.
I think that's why I preferred the size threshold idea. The active
packs are cheap to repack because they are small. The larger
packs aren't cheap to repack because they are large - and probably
historical. What we are trying to get is fast repacks for the
active objects while still getting full validation anytime we do a
repack and (possibly) destroy the source. A size threshold does it.
When Jon Smirl and I started kicking around the idea of a historical
pack for Mozilla I was thinking of just storing a list of pack base
names in ".git/objects/pack/historical". Packs listed there should
generally be exempt from repacking. During an initial clone we'd
need to deliver the contents of that file to the new repository,
as if the source considers a pack historical its likely the new
repository would want to as well.
But now as I write this email I'm thinking that it may be just as
easy to change the base name of the pack to "hist-n{40}" when we
want to consider it historical.
[snipped and re-ordered]
> It first downloads the .idx files, so it can compute the
> _right_ packname using the sorted object names recorded there
Why trust the .idx? I've seen you post that the .idx is purely
a local matter. The "smart" Git protocol only receives the .pack
from the remote and computes the .idx locally or unpacks it to loose
objects locally; why should a dumb transport trust the remote .idx?
Oh, I know, when the .idx is >50 MiB, the .pack is >450 MiB, has
2 million objects and delta chains ~5000 long.
Are we thinking that .idx files may need to have a slightly wider
distribution than "local"?
--
Shawn.
--
VGER BF report: S 1
next prev parent reply other threads:[~2006-09-04 8:40 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9e4733910608290943g6aa79855q62b98caf4f19510@mail.gmail.com>
[not found] ` <20060829165811.GB21729@spearce.org>
[not found] ` <9e4733910608291037k2d9fb791v18abc19bdddf5e89@mail.gmail.com>
[not found] ` <20060829175819.GE21729@spearce.org>
[not found] ` <9e4733910608291155g782953bbv5df1b74878f4fcf1@mail.gmail.com>
[not found] ` <20060829190548.GK21729@spearce.org>
[not found] ` <9e4733910608291252q130fc723r945e6ab906ca6969@mail.gmail.com>
[not found] ` <20060829232007.GC22935@spearce.org>
[not found] ` <9e4733910608291807q9b896e4sdbfaa9e49de58c2b@mail.gmail.com>
2006-08-30 1:51 ` Mozilla .git tree Shawn Pearce
2006-08-30 2:25 ` Shawn Pearce
2006-08-30 2:58 ` Jon Smirl
2006-08-30 3:10 ` Shawn Pearce
2006-08-30 3:27 ` Jon Smirl
2006-08-30 5:53 ` Nicolas Pitre
2006-08-30 11:42 ` Junio C Hamano
2006-09-01 7:42 ` Junio C Hamano
2006-09-02 1:19 ` Shawn Pearce
2006-09-02 4:01 ` Junio C Hamano
2006-09-02 4:39 ` Shawn Pearce
2006-09-02 11:06 ` Junio C Hamano
2006-09-02 14:20 ` Jon Smirl
2006-09-02 17:39 ` Shawn Pearce
2006-09-02 18:56 ` Linus Torvalds
2006-09-02 20:53 ` Junio C Hamano
2006-09-02 17:44 ` Shawn Pearce
2006-09-02 2:04 ` Shawn Pearce
2006-09-02 11:02 ` Junio C Hamano
2006-09-02 17:51 ` Shawn Pearce
2006-09-02 20:55 ` Junio C Hamano
2006-09-03 3:54 ` Shawn Pearce
2006-09-01 17:45 ` A Large Angry SCM
2006-09-01 18:35 ` Linus Torvalds
2006-09-01 19:56 ` Junio C Hamano
2006-09-01 23:14 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 0:23 ` Linus Torvalds
2006-09-02 1:39 ` VGER BF report? Johannes Schindelin
2006-09-02 5:58 ` Sam Ravnborg
2006-09-02 1:52 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 3:52 ` Junio C Hamano
2006-09-02 4:52 ` Shawn Pearce
2006-09-02 9:42 ` Junio C Hamano
2006-09-02 17:43 ` Linus Torvalds
2006-09-02 10:09 ` Junio C Hamano
2006-09-02 17:54 ` Shawn Pearce
2006-09-03 21:00 ` Junio C Hamano
2006-09-04 4:10 ` Shawn Pearce
2006-09-04 5:50 ` Junio C Hamano
2006-09-04 6:44 ` Shawn Pearce [this message]
2006-09-04 7:39 ` Junio C Hamano
2006-09-03 0:27 ` Linus Torvalds
2006-09-03 0:32 ` Junio C Hamano
2006-09-05 8:12 ` Junio C Hamano
2006-09-02 18:43 ` Linus Torvalds
2006-09-02 20:56 ` Junio C Hamano
2006-09-03 21:48 ` Junio C Hamano
2006-09-03 22:00 ` Linus Torvalds
2006-09-03 22:16 ` Linus Torvalds
2006-09-03 22:34 ` Junio C Hamano
2006-09-04 4:06 ` Junio C Hamano
2006-09-04 15:19 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060904064443.GB30032@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).