* read-only working copies using links
@ 2009-01-24 9:17 Chad Dombrova
2009-01-24 11:02 ` Sverre Rabbelier
0 siblings, 1 reply; 6+ messages in thread
From: Chad Dombrova @ 2009-01-24 9:17 UTC (permalink / raw)
To: git
hi all,
there's a major feature for working with large binaries that has not
yet been addressed by git: the ability to check out a file as a
symbolic/hard link to a blob in the repository, instead of duplicating
the file into the working copy.
imagine a scenario where one user is putting large binary files into a
git repo on a networked server. 100 other users on the server need
read-only access to this repo. they clone the repo using --shared or
--local, which saves disk space for the object files, but each of
these 100 working copies also creates copies of all the binary files
at the HEAD revision. it would be 100x as efficient in both disk space
and checkout speeds if, in place of these files, symbolic or hard
links were made to the blob files in .git/objects.
the crux of the issue is that the blob objects would have to be stored
as exact copies of the original files. it would seem there are two
things that currently prevent this from happening. 1) blobs are
stored with compression and 2) they include a small header.
compression can be disabled by setting core.loosecompression to 0, so
that seems like less of an issue. as for the header, wouldn't it be
possible to store it separately? in other words, store two files per
blob directory, a small stub file with the header info and the
unaltered file data.
what are the caveats to a system like this? has anyone looked into
this before?
-chad
p.s.
i tried submitting a post through nabble a few days and it said that
it was still pending, so i thought i'd try submitting directly to the
mailing list. sorry, if i end up double-posting
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: read-only working copies using links
2009-01-24 9:17 read-only working copies using links Chad Dombrova
@ 2009-01-24 11:02 ` Sverre Rabbelier
2009-01-24 18:39 ` Chad Dombrova
0 siblings, 1 reply; 6+ messages in thread
From: Sverre Rabbelier @ 2009-01-24 11:02 UTC (permalink / raw)
To: Chad Dombrova, Tim 'Mithro' Ansell; +Cc: git
Heya,
On Sat, Jan 24, 2009 at 10:17, Chad Dombrova <chadrik@gmail.com> wrote:
> the crux of the issue is that the blob objects would have to be stored as
> exact copies of the original files. it would seem there are two things that
> currently prevent this from happening. 1) blobs are stored with compression
> and 2) they include a small header. compression can be disabled by setting
> core.loosecompression to 0, so that seems like less of an issue. as for the
> header, wouldn't it be possible to store it separately? in other words,
> store two files per blob directory, a small stub file with the header info
> and the unaltered file data.
I think Tim Ansell (cced) was talking about this at the gittogether
(storing the metadata seperately), as it would benefit sparse/narrow
checkout, another advantage supporting his case?
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: read-only working copies using links
2009-01-24 11:02 ` Sverre Rabbelier
@ 2009-01-24 18:39 ` Chad Dombrova
2009-01-24 18:43 ` Sverre Rabbelier
2009-01-24 19:34 ` Jeff King
0 siblings, 2 replies; 6+ messages in thread
From: Chad Dombrova @ 2009-01-24 18:39 UTC (permalink / raw)
To: Sverre Rabbelier; +Cc: Tim 'Mithro' Ansell, git
>
> I think Tim Ansell (cced) was talking about this at the gittogether
> (storing the metadata seperately), as it would benefit sparse/narrow
> checkout, another advantage supporting his case?
>
what's the case against it, other than the obvious, that it will take
more work?
-chad
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: read-only working copies using links
2009-01-24 18:39 ` Chad Dombrova
@ 2009-01-24 18:43 ` Sverre Rabbelier
2009-01-24 19:35 ` Jeff King
2009-01-24 19:34 ` Jeff King
1 sibling, 1 reply; 6+ messages in thread
From: Sverre Rabbelier @ 2009-01-24 18:43 UTC (permalink / raw)
To: Chad Dombrova; +Cc: Tim 'Mithro' Ansell, git
On Sat, Jan 24, 2009 at 19:39, Chad Dombrova <chadrik@gmail.com> wrote:
> what's the case against it, other than the obvious, that it will take more
> work?
Good question, I think it was mostly that, someone has to implement it
(possibly as part of packv4). Backwards compatibility is of course
always an concern, but I'm not too familiar with the subject, perhaps
other people on the list (or even those were at the gittogether) can
comment?
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: read-only working copies using links
2009-01-24 18:39 ` Chad Dombrova
2009-01-24 18:43 ` Sverre Rabbelier
@ 2009-01-24 19:34 ` Jeff King
1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2009-01-24 19:34 UTC (permalink / raw)
To: Chad Dombrova; +Cc: Sverre Rabbelier, Tim 'Mithro' Ansell, git
On Sat, Jan 24, 2009 at 10:39:46AM -0800, Chad Dombrova wrote:
>> I think Tim Ansell (cced) was talking about this at the gittogether
>> (storing the metadata seperately), as it would benefit sparse/narrow
>> checkout, another advantage supporting his case?
>
> what's the case against it, other than the obvious, that it will take
> more work?
I'm not sure this is actually the same as Tim's proposal. Tim wanted to
store the commit and tree information separately from the blob
information (since his use case was that blobs are enormous, but the
rest is reasonable).
AIUI, Chad's proposal is about storing the actual blob data itself
separate from the blob object's metadata (i.e., its object type and
length headers). Which means that the normal loose object format is not
acceptable, and you would end up with something like (for example):
.git/objects/pack/pack-full-of-your-regular-stuff.{pack,idx}
.git/objects/[0-9a-f]{2}/[0-9a-f]{38}/header
.git/objects/[0-9a-f]{2}/[0-9a-f]{38}/data
or something similar. Then you could hardlink directly to the 'data'
portion. So you would need:
- to teach everything that ever looks for loose objects how to read
this new format. In theory, it's all nicely encapsulated in
sha1_file.c
- to teach checkout routines to hardlink such a case instead of
copying the file
The obvious downsides that I can think of are:
- it has the potential to make object reading, which is a core part of
git (read: very performance- and correctness- sensitive) a lot more
complex. But maybe the implementation would not be that painful;
somebody would have to look very closely to see.
- it interacts badly with smudge/clean filters and crlf conversion.
In those cases you can't hardlink. If you treat this like an
optimization, though, it's not so bad: we only do the optimization
when we _can_, and fall back to regular checkout if those other
options are in effect.
- it's somewhat dangerous to your repository's health. Git's model is
that object files are immutable (since they are, after all, named
after their contents). But now you are linking them into your
working tree, which makes them susceptible to some third party tool
munging them. So yes, most tools will probably behave, but any tool
that misbehaves will actually corrupt your repository.
-Peff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: read-only working copies using links
2009-01-24 18:43 ` Sverre Rabbelier
@ 2009-01-24 19:35 ` Jeff King
0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2009-01-24 19:35 UTC (permalink / raw)
To: Sverre Rabbelier; +Cc: Chad Dombrova, Tim 'Mithro' Ansell, git
On Sat, Jan 24, 2009 at 07:43:20PM +0100, Sverre Rabbelier wrote:
> On Sat, Jan 24, 2009 at 19:39, Chad Dombrova <chadrik@gmail.com> wrote:
> > what's the case against it, other than the obvious, that it will take more
> > work?
>
> Good question, I think it was mostly that, someone has to implement it
> (possibly as part of packv4). Backwards compatibility is of course
> always an concern, but I'm not too familiar with the subject, perhaps
> other people on the list (or even those were at the gittogether) can
> comment?
If I understand his proposal correctly, such objects must _not_ be part
of a pack. The whole idea is splitting them _more_, not less.
-Peff
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-01-24 19:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-24 9:17 read-only working copies using links Chad Dombrova
2009-01-24 11:02 ` Sverre Rabbelier
2009-01-24 18:39 ` Chad Dombrova
2009-01-24 18:43 ` Sverre Rabbelier
2009-01-24 19:35 ` Jeff King
2009-01-24 19:34 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).