git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Feature request: thin checkout
@ 2007-06-15  8:53 linux
  2007-06-15  9:49 ` Rogan Dawes
  2007-06-15 10:44 ` Johannes Sixt
  0 siblings, 2 replies; 3+ messages in thread
From: linux @ 2007-06-15  8:53 UTC (permalink / raw)
  To: git; +Cc: linux

Git packs so well that it's very common for the unpacked source to be much
larger than the history in .git.  The linux-kernel archive is a prime example.

I've also started using git-svn (awesome tool, BTW) and have discovered
the impressive disk space costs associated with SVN's tags/ directories
if I actually want to download the full history.

If you have multiple cloned repositories on one system, git can share
the history, but the working directory problem is exacerbated.
(Disk is cheap, but the RAM to cache it is limited.)

This got me thinking...
Wouldn't it be nice if there were a way to tell git-update-index and
git-checkout index that certain directories are not in the working
directory, but don't worry.  Just pretend they exist and match the index.

Then I could mark much of arch/* as "don't bother" and save a pile of
disk space per working directory.

This would be a little bit annoying if I tried to merge two branches with
conflicts in a "masked" part of the tree (well, it would create the index
entries, but I'd have no way to resolve the conflict), but I think that's
a matter of Don't Do That.

A slightly more flexible (but confusing?) option would be to mark parts
of the tree as "don't commit deletion".  That is, within named sections
of the tree:
- Missing files in the working directory are assumed unchanged from
  the index.  (Perhaps unless you explicitly git-add them.)
- Files that don't already exist aren't checked out from the index.
  (Unless explicitly named in a git-checkout operation.)
... but you could have a "selective checkout" in some directories.
E.g. in the kernel, you could include a stub Makefile, but omit
the .c files for file systems you don't need.

(And if we're really sneaky, teach the linux kernel Makefile how to check
out code when features are enabled.  That would address a longstanding
complaint about the size of the linux kernel source tree.  It's a
bit trickier than default make rules for getting <foo> from RCS/foo,v
because got doesn't provide a signle file that make(1) can look for,
but something's probably possible.)

That could also handle merges... the "check out the file with conflict
markers" operation could be unconditional if there are conflicts.


The multiple-git-repositories issue could be handled by hard-linking
the working directory files together (assuming your editor knows how to
unlink when changing them) using information easily available in the
index files.  Git could even detect and complain if a two files that
mismatched their index entries were hard-linked together.

But for the git-svn case where you have a tags/ directory full of old
copies of files, hard-linking is of limited use if most files changed
between tags.  Here, just being able to say "don't bother populating
that part of the working directory" would be very nice.


Does this make sense to anyone else?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Feature request: thin checkout
  2007-06-15  8:53 Feature request: thin checkout linux
@ 2007-06-15  9:49 ` Rogan Dawes
  2007-06-15 10:44 ` Johannes Sixt
  1 sibling, 0 replies; 3+ messages in thread
From: Rogan Dawes @ 2007-06-15  9:49 UTC (permalink / raw)
  To: linux; +Cc: git

linux@horizon.com wrote:
> Git packs so well that it's very common for the unpacked source to be much
> larger than the history in .git.  The linux-kernel archive is a prime example.
> 
> I've also started using git-svn (awesome tool, BTW) and have discovered
> the impressive disk space costs associated with SVN's tags/ directories
> if I actually want to download the full history.
> 
> If you have multiple cloned repositories on one system, git can share
> the history, but the working directory problem is exacerbated.
> (Disk is cheap, but the RAM to cache it is limited.)
> 
> This got me thinking...
> Wouldn't it be nice if there were a way to tell git-update-index and
> git-checkout index that certain directories are not in the working
> directory, but don't worry.  Just pretend they exist and match the index.
> 

I think that update-index is able to do (some of) this already:

$ man git-update-index

SYNOPSIS
        git-update-index

                     [--cacheinfo <mode> <object> <file>]*


        --cacheinfo <mode> <object> <path>
               Directly insert the specified info into the index.


USING --CACHEINFO OR --INFO-ONLY
        --cacheinfo is used to register a file that is not in the current
        working directory. This is useful for minimum-checkout merging.

        To pretend you have a file with mode and sha1 at path, say:

        $ git-update-index --cacheinfo mode sha1 path
        --info-only is used to register files without placing them in the
        object database. This is useful for status-only repositories.

        Both --cacheinfo and --info-only behave similarly: the index is 
updated
        but the object database isn't. --cacheinfo is useful when the 
object is
        in the database but the file isn't available locally. --info-only is
        useful when the file is available, but you do not wish to update the
        object database.

At any rate, it looks like some of the infrastructure is existing 
already, even if the complete solution doesn't exist.

I *guess* it might even be as simple as maintaining a list of 
"uncheckedout files with mode and sha" in the .git directory, and 
merging that with what has actually been checked out when updating the 
index.

i.e.

$ git checkout master:src/drivers

Get the <tree> object for master. Step through each entry. If the 
requested path falls under the entry, recurse into it, checking out the 
required files, otherwise write the <tree/file> info into 
.git/partialcheckout.

Hack, hack, hack in src/drivers.

When you want to check what part of the tree is dirty, check if 
.git/partialcheckouts exists. If it does, read through each entry, 
comparing them to the index. Then, for the entries that are not in 
partialcheckout, but are in the index, actually go to the filesystem to 
check stat for each file.

Not quite sure how to handle something like:

$ git checkout master:src/drivers/scsi
$ git checkout master:src/drivers/usb

I guess one would have to trim entries from .git/partialcheckout as they 
are actually fully checked out.

Rogan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Feature request: thin checkout
  2007-06-15  8:53 Feature request: thin checkout linux
  2007-06-15  9:49 ` Rogan Dawes
@ 2007-06-15 10:44 ` Johannes Sixt
  1 sibling, 0 replies; 3+ messages in thread
From: Johannes Sixt @ 2007-06-15 10:44 UTC (permalink / raw)
  To: git

linux@horizon.com wrote:
> Wouldn't it be nice if there were a way to tell git-update-index and
> git-checkout index that certain directories are not in the working
> directory, but don't worry.  Just pretend they exist and match the index.
> 
> Then I could mark much of arch/* as "don't bother" and save a pile of
> disk space per working directory.

Currently, directories are not registered in the index. For this reason,
empty directories cannot be versioned. One day, will want to support
this, and for this purpose, directories must also be registered in the
index. Once this infrastructure is in place, the "don't bother" flag
will be a no-brainer, methinks.

-- Hannes

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-06-15 10:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-15  8:53 Feature request: thin checkout linux
2007-06-15  9:49 ` Rogan Dawes
2007-06-15 10:44 ` Johannes Sixt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).