git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Light-weight checkouts via ".gitlink"
@ 2006-12-08 21:52 Josef Weidendorfer
  2006-12-08 22:18 ` Jakub Narebski
  0 siblings, 1 reply; 8+ messages in thread
From: Josef Weidendorfer @ 2006-12-08 21:52 UTC (permalink / raw)
  To: git

Hi,

when I recently thought about submodule support, I had the
idea that it is easier to get it by going in small,
incremental steps, introducing usefull subfeatures on their
on while on it.

The following is one outcome of this, a proposal for
light-weight checkouts of git branches, without the need
to have to full repository in a .git subdirectory, but
the just have a file .gitlink is simple as possible, which
manages the link to the real repository.

Of course, this feature is tailored to support checkouts of
submodules as being such light-weight checkouts themself.
The current proposal just says that light-weight checkouts
should be ignored when existing inside of another checkout.
For real submodule support, we would want to be able the
do a "git add" on these light-weight checkouts (which of
course are bound to some commit), and on "git commit", this
would add a "submodule object" at this place into the tree
of the outer repository.

Comments?

Josef


============================================================

Support for multiple external light-weight checkouts
aka ".gitlink" proposal


Main ideas behind light-weight checkouts
----------------------------------------

Make submodules easier to implement by separating
part of the needed infrastructure into a independent,
yet useful feature

(1) Allow to separate a branch checkout from its repository location
on the local filesystem. This minimally needs to be only _one_ file,
called ".gitlink" here. One should be able to move the
checkout directory around (within some limits) without breaking
the link to its "base" repository.

We want this later for submodule checkouts inside of a supermodule
checkout: (a) to not loose the objects+index+HEAD if the user
removes the whole checkout to remove the submodule in next
supermodule commit; (b) to allow for moving submodules around
between supermodule commits.

This can be implemented with a small script reading .gitlink and setting
up $GITDIR and $GIT_INDEX_FILE accordingly. However, there should be
a further environment variable to use a file for HEAD ($GIT_HEAD_FILE?).
Interpretation of the link to the base directory has to be a little smart,
ie. by prefixing a relative path with as many ".." as needed to find a
git repository.

(2) Light-weight checkouts should work inside of another
checkout, be it a normal or a light-weight checkout itself.
The subdirectory of the light-weight checkout has to be ignored
in the outer checkout (at least by default).

This is needed to be able to use light-weight checkouts for submodels,
as these should be checked out inside of another checkout (even
inside of another submodule checkout for hierarchical submodules).

This can be implemented by enhancing git to ignore any subdirectory which
has a file .gitlink in it.


Example usage
-------------

Keep a checkout of the todo branch of git.git inside of the checkout of
any other branch (or fully outside). You should be able to use all the normal
git commands inside of this directory (commit etc).
Something like inside of e.g. a master checkout of git.git:

 make todo
 cd todo
 touch .gitlink
 git checkout todo

The .gitlink file can be empty in this case when we make the smart lookup of
the base repository do the right thing with a default of an emtpy relative path.

Another usage is to keep all your git repositories (looking like bare ones)
in one place, e.g. below $HOME/git-repositories/, and have the the checkouts
you are working in at another place, without the need for setting $GITDIR etc
for this.


Some thoughts about implementation
----------------------------------

As we need our own index and HEAD for any light-weight checkout, we
can choose a subdirectory of .git of the base repository to store a full,
independent working GITDIR on its own. It would default to linking the
objects and refs namespace to the base repository, but for submodules,
it *can* have its own object database and refs namespace; it just would
be locateable via the GITDIR of the base repository.

"Locateable" means that we need a name for different light-weight
checkouts. The name can be determined from the relative position of
the light-weight checkout to its base repository, or could be fixed
and specified in the .gitlink file.

Example for an light-weight checkout with name "mywork", linking to
the (bare) base repository "base":

 base.git/external/mywork/HEAD
 base.git/external/mywork/index
 base.git/external/mywork/refs -> ../../../refs
 base.git/external/mywork/objects -> ../../../objects

A light-weight checkout in a sibling directory to base.git needs
the relative path to its own .git directory:
"../base.git/external/mywork".

This can be split up in two things:
- relative path to base repository
- a name for this light-weight checkout
One has to be able to get this out of the content of the .gitlink file.


Proposal for .gitlink entries (one line per entry)
--------------------------------------------------

* Gitdir = "<Path to base git repository>"

Optional.
An absolute or relative path to the base git repository.
With relative path, a heuristic is used to find the
git directory: the path will be prefixed by as
many "../" as needed, and ".git" or .gitlink appended.
The value defaults to an empty relative path, which
will check all parent directories for a .git subdirectory.

* Name: <explicit name for this checkout>

Optional.
Git uses this name to find its own GITDIR in the GITDIR
of the base repository. If not specified, the name defaults
to the relative path of the light-weight checkout from
the base directory, stripping any ".." in front.



For the "mywork" example above, we could have a checkout
directory, sibling to "base.git" and called "mywork".
For the .gitlink file inside, it is enough to specify

 Gitdir = base

Because the search heuristic will find the base repository
at real relative path "../base.git". Further, relative
path of the checkout from the base repository is "../mywork",
giving the name "mywork".

When moving the checkout into another directory, it's name
would change if not explicitly specified in the .gitlink
file. For submodules, this could be the wanted semantic.
However, with changing name, we have possibly lost our
index and HEAD.
A solution for this could be to always store a copy of HEAD
into .gitlink, or even have a "Head:" .gitlink entry itself
as HEAD.

============================================================

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Light-weight checkouts via ".gitlink"
  2006-12-08 21:52 [RFC] Light-weight checkouts via ".gitlink" Josef Weidendorfer
@ 2006-12-08 22:18 ` Jakub Narebski
  2006-12-08 22:54   ` Josef Weidendorfer
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Narebski @ 2006-12-08 22:18 UTC (permalink / raw)
  To: git

A few (very few) comments:

Josef Weidendorfer wrote:

> This can be implemented by enhancing git to ignore any subdirectory which
> has a file .gitlink in it.

If I remember correctly, while git ignores .git, it does not ignore
by default (i.e. without entry in either GIT_DIR/info/excludes, or
.gitignore) the directory which has .git directory in it.

And that should not change for .gitlink. You can always add
.gitignore file with * .* patterns in it (ignore all).
 
> * Gitdir = "<Path to base git repository>"
[...]
> * Name: <explicit name for this checkout>

Why use once "key = value", once "key: value" form? Better to stick
with one. I Would prefer "key = value" one.

GIT_DIR = path to base git repository
it is equivalent to setting the following:

GIT_INDEX_FILE = path to index file
GIT_OBJECT_DIRECTORY = path to object directory
GIT_HEAD_FILE = path to HEAD file
GIT_REFS_DIRECTORY = path to refs directory

NAME = name
should match "name subdirectory" entry in modules file in superproject.


Perhaps instead of adding arbitrary number of .. in front of relative
path, we better use some magic, like ... for finding somewhere up?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Light-weight checkouts via ".gitlink"
  2006-12-08 22:18 ` Jakub Narebski
@ 2006-12-08 22:54   ` Josef Weidendorfer
  2006-12-08 23:24     ` Jakub Narebski
  2006-12-08 23:25     ` Josef Weidendorfer
  0 siblings, 2 replies; 8+ messages in thread
From: Josef Weidendorfer @ 2006-12-08 22:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Friday 08 December 2006 23:18, Jakub Narebski wrote:
> A few (very few) comments:
> 
> Josef Weidendorfer wrote:
> 
> > This can be implemented by enhancing git to ignore any subdirectory which
> > has a file .gitlink in it.
> 
> If I remember correctly, while git ignores .git, it does not ignore
> by default (i.e. without entry in either GIT_DIR/info/excludes, or
> .gitignore) the directory which has .git directory in it.

I know. But this is essential. We _have_ to ignore all the files and
subdirectories in the directory which contains the .gitlink file,
as these files/subdirectories belong to the submodule.

There is no other way. You could try to use a special name for the
whole directory with the light-weight checkout, e.g. ".checkout".

But then, this is useless for submodules, as for submodules, we want to
be able to specify the root directory name of the submodule, as that
is the name which will end up in the tree object of the supermodule.

> And that should not change for .gitlink. You can always add
> .gitignore file with * .* patterns in it (ignore all).

That is not possible:
.gitignore file has its own meaning inside of the light-weight
checkout aka submodule, as this directory is the root directory of
a git checkout.

AFAIK, Martin's submodule support does it the same, only for directories
with .git, as he stores the GITDIR directly in the submodule
checkout.


> > * Gitdir = "<Path to base git repository>"
> [...]
> > * Name: <explicit name for this checkout>
> 
> Why use once "key = value", once "key: value" form? Better to stick
> with one. I Would prefer "key = value" one.

Sorry. Typo ;-)


> GIT_DIR = path to base git repository
> it is equivalent to setting the following:
> 
> GIT_INDEX_FILE = path to index file
> GIT_OBJECT_DIRECTORY = path to object directory
> GIT_HEAD_FILE = path to HEAD file
> GIT_REFS_DIRECTORY = path to refs directory

AFAIK the latter two do not exist yet, or do they?

I would also be fine with .gitlink looking like some shell script,
defining these variables. However, we need the smart directory
lookup.
And IMHO the keys can be case insensitive as in .git/config.

I am not sure we want to allow the freedom of being able to put any of
GIT_INDEX_FILE, GIT_OBJECT_DIRECTORY, GIT_HEAD_FILE, GIT_REFS_DIRECTORY
in the .gitlink file.

It is enough if GITDIR and NAME is given. With GITDIR_REAL after the
smart lookup, e.g. GIT_INDEX_FILE would default to $GITDIR_REAL/external/$NAME
and so on.

However, for submodules we really _want_ to have fully independent GITDIRs
for each submodule somewhere, and we would have to warn:

 # Warning: if you change one GIT_INDEX, ... in this file, you
 # will screw up the possibility to clone from the GITDIR directory


> NAME = name
> should match "name subdirectory" entry in modules file in superproject.

Yes.
This would be in my next proposal about how to build the submodule support
on light-checkouts ;-)

 
> Perhaps instead of adding arbitrary number of .. in front of relative
> path, we better use some magic, like ... for finding somewhere up?

I thought about it. But why whould you need it?
If the value of GITDIR in .gitlink begins with "/", it is an absolute path.
If not, I think you always want the smart lookup the go upwards, i.e.
looking for

  ../<relpath>.git
  ../../<relpath>.git
  ../../../<relpath>.git

So there is no need to add "..." in front of the relative path.
Or do you see a usecase for
 rel/path/start/.../rel/path/end

Ah, yes, I see. Perhaps this makes sense with absolute paths:

	/home/user/repos/.../linux

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Light-weight checkouts via ".gitlink"
  2006-12-08 22:54   ` Josef Weidendorfer
@ 2006-12-08 23:24     ` Jakub Narebski
  2006-12-08 23:40       ` Josef Weidendorfer
  2006-12-08 23:25     ` Josef Weidendorfer
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Narebski @ 2006-12-08 23:24 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: git

Dnia piątek 8. grudnia 2006 23:54, Josef Weidendorfer napisał:
> On Friday 08 December 2006 23:18, Jakub Narebski wrote:
>> A few (very few) comments:
>> 
>> Josef Weidendorfer wrote:
>> 
>>> This can be implemented by enhancing git to ignore any subdirectory which
>>> has a file .gitlink in it.
>> 
>> If I remember correctly, while git ignores .git, it does not ignore
>> by default (i.e. without entry in either GIT_DIR/info/excludes, or
>> .gitignore) the directory which has .git directory in it.
> 
> I know. But this is essential. We _have_ to ignore all the files and
> subdirectories in the directory which contains the .gitlink file,
> as these files/subdirectories belong to the submodule.
> 
> There is no other way. You could try to use a special name for the
> whole directory with the light-weight checkout, e.g. ".checkout".
> 
> But then, this is useless for submodules, as for submodules, we want to
> be able to specify the root directory name of the submodule, as that
> is the name which will end up in the tree object of the supermodule.
> 
>> And that should not change for .gitlink. You can always add
>> .gitignore file with * .* patterns in it (ignore all).
> 
> That is not possible:
> .gitignore file has its own meaning inside of the light-weight
> checkout aka submodule, as this directory is the root directory of
> a git checkout.

I have forgot about that. Right.

The only possibility would be to use GIT_DIR/info/excludes with path
to submodule, and this conflict with the ability to rename and move
submodules.

> AFAIK, Martin's submodule support does it the same, only for directories
> with .git, as he stores the GITDIR directly in the submodule
> checkout.

Ah. 

[...]
>> GIT_DIR = path to base git repository
>> it is equivalent to setting the following:
>> 
>> GIT_INDEX_FILE = path to index file
>> GIT_OBJECT_DIRECTORY = path to object directory
>> GIT_HEAD_FILE = path to HEAD file
>> GIT_REFS_DIRECTORY = path to refs directory
> 
> AFAIK the latter two do not exist yet, or do they?

They do not exist; perhaps they should for completeness.

[...] 
> It is enough if GITDIR and NAME is given. With GITDIR_REAL after the
> smart lookup, e.g. GIT_INDEX_FILE would default to $GITDIR_REAL/external/$NAME
> and so on.

Not $GITDIR_REAL/submodules/<name>/index (or modules instead of
submodules)?

>> NAME = name
>> should match "name subdirectory" entry in modules file in superproject.
> 
> Yes.
> This would be in my next proposal about how to build the submodule support
> on light-checkouts ;-)

I have thought that with "each submodule as separate repository" approach
to submodules the modules file would have module name and either
subdirectory in which submodule resides, or GIT_DIR of submodule. And
this file could be generated on checkout... which doesn't survive closer
scrutiny.

But this would work well with submodules, that's a fact.
  
>> Perhaps instead of adding arbitrary number of .. in front of relative
>> path, we better use some magic, like ... for finding somewhere up?
> 
> I thought about it. But why whould you need it?
> If the value of GITDIR in .gitlink begins with "/", it is an absolute path.
> If not, I think you always want the smart lookup the go upwards, i.e.
> looking for
> 
>   ../<relpath>.git
>   ../../<relpath>.git
>   ../../../<relpath>.git
> 
> So there is no need to add "..." in front of the relative path.
> Or do you see a usecase for
>  rel/path/start/.../rel/path/end
> 
> Ah, yes, I see. Perhaps this makes sense with absolute paths:
> 
> 	/home/user/repos/.../linux

You mean that the above means to check the following paths:

  /home/user/repos/linux
  /home/user/linux
  /home/linux
  /linux

not the searching subdirectories of /home/user/repos for linux
directory (there can be many)? BTW web2c implementation of TeX,
namely kpathsea(rch) uses // for that, i.e. a//b means b which
is somwehere in subdirectories of a.
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Light-weight checkouts via ".gitlink"
  2006-12-08 22:54   ` Josef Weidendorfer
  2006-12-08 23:24     ` Jakub Narebski
@ 2006-12-08 23:25     ` Josef Weidendorfer
  2006-12-08 23:53       ` Jakub Narebski
  1 sibling, 1 reply; 8+ messages in thread
From: Josef Weidendorfer @ 2006-12-08 23:25 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Friday 08 December 2006 23:54, Josef Weidendorfer wrote:
> > NAME = name

Forgot to mention in the proposal:
If you recursively have light-weight checkouts inside each other,
the real "name" (for .git/external/<name/ and for further submodule
configuration e.g. in .git/modules of the base repository)
should of course be the concatenation of the names in the .gitlink
files while going up to the base repository.

> > Perhaps instead of adding arbitrary number of .. in front of relative
> > path, we better use some magic, like ... for finding somewhere up?

No need. Something like

> 	/home/user/.../linux

is crazy. Do you want to scan all of your home directory everytime this
lookup is needed? So "..." really only makes sense in front of the
relative path, but there, you also can leave it out.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Light-weight checkouts via ".gitlink"
  2006-12-08 23:24     ` Jakub Narebski
@ 2006-12-08 23:40       ` Josef Weidendorfer
  0 siblings, 0 replies; 8+ messages in thread
From: Josef Weidendorfer @ 2006-12-08 23:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Saturday 09 December 2006 00:24, Jakub Narebski wrote:
> > That is not possible:
> > .gitignore file has its own meaning inside of the light-weight
> > checkout aka submodule, as this directory is the root directory of
> > a git checkout.
> 
> I have forgot about that. Right.
> 
> The only possibility would be to use GIT_DIR/info/excludes with path
> to submodule, and this conflict with the ability to rename and move
> submodules.

Yes. This whole .gitlink thing more or less is about trying to
avoid as far as possible any path configuration in the supermodule
which would have to be changed when the user moves or even deletes
the submodule. Exactly for the latter, we want the GITDIR for submodules
better be separate.

> >> GIT_DIR = path to base git repository
> >> it is equivalent to setting the following:
> >> 
> >> GIT_INDEX_FILE = path to index file
> >> GIT_OBJECT_DIRECTORY = path to object directory
> >> GIT_HEAD_FILE = path to HEAD file
> >> GIT_REFS_DIRECTORY = path to refs directory
> > 
> > AFAIK the latter two do not exist yet, or do they?
> 
> They do not exist; perhaps they should for completeness.

Actually, I am fine with allowing them in .gitlink. This makes
the whole thing much more flexible.

> [...] 
> > It is enough if GITDIR and NAME is given. With GITDIR_REAL after the
> > smart lookup, e.g. GIT_INDEX_FILE would default to $GITDIR_REAL/external/$NAME
> > and so on.
> 
> Not $GITDIR_REAL/submodules/<name>/index (or modules instead of
> submodules)?

Ooops, yes.
I am not actually sure what's the best name here: "external", "submodule", ... ?
I thought the the SVN name also fits for the submodule case. The submodule
is independent, and possibliy comes from an external git repository.

> >> NAME = name
> >> should match "name subdirectory" entry in modules file in superproject.
> > 
> > Yes.
> > This would be in my next proposal about how to build the submodule support
> > on light-checkouts ;-)
> 
> I have thought that with "each submodule as separate repository" approach
> to submodules the modules file would have module name and either
> subdirectory in which submodule resides, or GIT_DIR of submodule. And
> this file could be generated on checkout... which doesn't survive closer
> scrutiny.

Of course, that is a more simple approach. But I think the .gitlink thing
really is more flexible without being more complex.

> > Ah, yes, I see. Perhaps this makes sense with absolute paths:
> > 
> > 	/home/user/repos/.../linux
> 
> You mean that the above means to check the following paths:
> 
>   /home/user/repos/linux
>   /home/user/linux
>   /home/linux
>   /linux

No.

> not the searching subdirectories of /home/user/repos for linux
> directory (there can be many)?

Yes. But you can scratch this.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Light-weight checkouts via ".gitlink"
  2006-12-08 23:25     ` Josef Weidendorfer
@ 2006-12-08 23:53       ` Jakub Narebski
  2006-12-09  1:46         ` Josef Weidendorfer
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Narebski @ 2006-12-08 23:53 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: git

Josef Weidendorfer wrote:
> On Friday 08 December 2006 23:54, Josef Weidendorfer wrote:
>> Jakub Narebski wrote:
>>> NAME = name
> 
> Forgot to mention in the proposal:
> If you recursively have light-weight checkouts inside each other,
> the real "name" (for .git/external/<name/ and for further submodule
> configuration e.g. in .git/modules of the base repository)
> should of course be the concatenation of the names in the .gitlink
> files while going up to the base repository.

Why concatenation? I thought the name would be ID of submodule,
and should be just somehow unique.

And if concatenation, pehaps some forbidden character inserted between
them? Like '/' for example ;-)
 
>>> Perhaps instead of adding arbitrary number of .. in front of
>>> relative path, we better use some magic, like ... for finding
>>> somewhere up? 
> 
> No need. Something like
> 
>> 	/home/user/.../linux
> 
> is crazy. Do you want to scan all of your home directory everytime
> this  lookup is needed? So "..." really only makes sense in front of
> the relative path, but there, you also can leave it out.

No. I meant /home/user/.../linux to mean searching for
  /home/user/linux
  /home/linux
  /linux
but I don't think it is useful. As to relative path matching in any
parent directory... well, that differs only in direction (up instead of 
down) in matching filename in .gitignore when path does not contain /
(well, actually it is taken as fileglob).
-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Light-weight checkouts via ".gitlink"
  2006-12-08 23:53       ` Jakub Narebski
@ 2006-12-09  1:46         ` Josef Weidendorfer
  0 siblings, 0 replies; 8+ messages in thread
From: Josef Weidendorfer @ 2006-12-09  1:46 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Saturday 09 December 2006 00:53, Jakub Narebski wrote:
> Josef Weidendorfer wrote:
> > On Friday 08 December 2006 23:54, Josef Weidendorfer wrote:
> >> Jakub Narebski wrote:
> >>> NAME = name
> > 
> > Forgot to mention in the proposal:
> > If you recursively have light-weight checkouts inside each other,
> > the real "name" (for .git/external/<name/ and for further submodule
> > configuration e.g. in .git/modules of the base repository)
> > should of course be the concatenation of the names in the .gitlink
> > files while going up to the base repository.
> 
> Why concatenation? I thought the name would be ID of submodule,
> and should be just somehow unique.
> 
> And if concatenation, pehaps some forbidden character inserted between
> them? Like '/' for example ;-)

Yes, you are right.
Nesting of submodules really is an important issue. The .gitlink
file allows us to put the submodule GITDIR somewhere in the supermodul's
GITDIR. The idea is that you can clone the submodule GITDIR if you
want. With submodule "inner" nesting inside of  submodule "outer", 
the GITDIR of "outer" should have the GITDIR of inner inside to allow
for cloning "outer" together with its submodule "inner".

So it is not enough to have a submodule "outer" and a submodule
"outer/inner" in the supermodule. We want

	super.git/ext/outer.git/

to be the GITDIR for submodule "outer", and inside 

	super.git/ext/outer.git/ext/inner.git
.

Actually, it would be nice if this .gitlink proposal did not have to
deal with it. Instead, a .gitlink should be so flexible to allow such
nesting if needed. 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-12-09  1:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-08 21:52 [RFC] Light-weight checkouts via ".gitlink" Josef Weidendorfer
2006-12-08 22:18 ` Jakub Narebski
2006-12-08 22:54   ` Josef Weidendorfer
2006-12-08 23:24     ` Jakub Narebski
2006-12-08 23:40       ` Josef Weidendorfer
2006-12-08 23:25     ` Josef Weidendorfer
2006-12-08 23:53       ` Jakub Narebski
2006-12-09  1:46         ` Josef Weidendorfer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).