From: Junio C Hamano <junkio@cox.net>
To: James Ketrenos <jketreno@linux.intel.com>
Cc: git@vger.kernel.org, torvalds@osdl.org
Subject: Re: GIT overlay repositories
Date: Wed, 13 Jul 2005 15:35:35 -0700 [thread overview]
Message-ID: <7vmzoqqqq0.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: 42D5578D.3000301@linux.intel.com
James Ketrenos <jketreno@linux.intel.com> writes:
> Start with two repositories, let's call them Repo-A and Repo-B. Repo-A
> is hosted on some server somewhere and contains lots of code
> (let's say its a kernel source repository). Repo-B is only adding a
> small amount of changes to the repo (for argument sake, let's say the
> IPW2100 and IPW2200 projects) on top of what is already provided by Repo-A.
>
> For several reasons, we would like users to be able to get just the
> differences between Repo-A and Repo-B from me.
I have done something like this in late April - early May when I
ran git-jc repository. I took the then-current Linus head from
git.git, placed only my commits and objects that are absent from
his tree to a public place, and asked pullers to pull from Linus
first and then from me to get a usable repository. My
understanding is that you are formalizing and automating the
part "please pull from X, Y, Z and then from me to complete what
I have, making it usable". If that is the case, I agree with
the intentions [*1*].
This is related to the reason Linus wanted to have .git/config,
something that records "this object database depends on these
other object databases to be complete", when we talked about
ALTERNATE_OBJECT_DIRECTORIES last time. I am wondering if there
is a way to solve these two related problems in a unified way.
One minor problem is that ALTERNATE_OBJECT_DIRECTORIES is a
local thing and we do not want to be able to express URLs in
there, because we do not want to run rsync nor curl from inside
sha1_file.c. The "partial repository" problem, on the other
hand, is to publish such a repository and you _do_ want to have
URLs, likely to be somewhere completely different from where
such a partial repository is hosted at, reachable from your
pullers.
Regardless of whatever we end up doing, I have one proposal to
make. How about having .git/objects/info/ directory for housing
various "object database" specific (and not repository specific)
information?
The set of files I would see immediate benefits are:
objects/info/ext -- this is your .git/refs/ancestors file [*2*],
that lists external URLs that the objects
in this object database depends upon,
along with the set of head commits
there to start pulling from to complete
this partial object database [*3*]. This
_should_ name URLs accessible to
expected pullers from this repository.
objects/info/alt -- list of local alternate object
directories; probably we should
deprecate ALTERNATE_OBJECT_DIRECTORIES
environment variable with this, and
rewrite parts of sha1_file.c. I'd
volunteer to do it if we have
consensus.
objects/info/pack -- list of pack files in objects/pack/;
this would be useful for discovery
through really dumb web servers [*4*].
Using something like the above structure, pulling from this
"partial" repository at rsync://abc.xz/x.git would go this way:
(1) Sync from rsync://abc.xz/x.git/objects/
(2) Read objects/info/ext just slurped from there. Run the
procedure (1) thru (3) against the URLs listed in the
file, recursively.
(3) [*5*] Read objects/info/alt just slurped from there. Say
it contained ../a.git and ../b.git. Run the procedure
(1) thru (3) against rsync://abc.xz/a.git/objects/ and
rsync://abc.xz/b.git/objects/ recursively.
(4) Sync from rsync://abc.xz/x.git/refs/ as needed.
Non-rsync transfer can and should be done the same way. In
either case, updating the puller's refs/ is done solely based on
the information from rsync://abc.xz/x.git/refs/ and not from
refs in the depended-upon repositories.
Am I basically on the same page as you are?
[Footnote]
*1* But you are conflating it something else. I will not
comment on the part you talk about merges in this message,
because forward-porting your own changes to updated upstream is
orthogonal to the "partial object database" issue. It needs be
taken care of independently even if you maintain a fully
populated object database for your development.
*2* Ancestry is an overused word, so I would propose to call
this "external dependency": your partial object database depends
on them to be complete and usable.
*3* I was wondering why you wanted to record the foreign head in
addition to the URLs first, but you need that information (and
probabaly that needs to be a set of heads, not just a single
head) because their head may move, and in the worst case may
even be rewound to something that is not a descendant of what
you depend upon.
*4* This is not related to the current topic.
*5* This part is optional, because some alternates used locally
at the partial repository site may not be exposed to the same
pullers, or even when exposed, the alternate site may be behind
a slow link and it is preferable to get the same information
from somewhere else listed in info/ext file. On the other hand,
it is simpler to just maintain info/alt file to serve for both
local and remote purposes. I don't offhand know the tradeoffs.
prev parent reply other threads:[~2005-07-14 8:58 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-13 18:03 GIT overlay repositories James Ketrenos
2005-07-13 22:35 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vmzoqqqq0.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=git@vger.kernel.org \
--cc=jketreno@linux.intel.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).